\coltauthor\Name

Leena Chennuru Vankadara* \Email[email protected]
\addrUniversity of Tübingen and \NameLuca Rendsburg* \Email[email protected]
\addrUniversity of Tübingen and \NameUlrike von Luxburg \Email[email protected]
\addrUniversity of Tübingen and \NameDebarghya Ghoshdastidar \Email[email protected]
\addrTechnical University of Munich,
Department of Informatics,
Munich Data Science Institute

Interpolation and Regularization for Causal Learning

Abstract

We study the problem of learning causal models from observational data through the lens of interpolation and its counterpart—regularization. A large volume of recent theoretical as well as empirical work suggests that, in highly complex model classes, interpolating estimators can have good statistical generalization properties and can even be optimal for statistical learning. Motivated by an analogy between statistical and causal learning recently highlighted by Jan:2019, we investigate whether interpolating estimators can also learn good causal models. To this end, we consider a simple linearly confounded model and derive precise asymptotics for the causal risk of the min-norm interpolator and ridge-regularized regressors in the high-dimensional regime. Under the principle of independent causal mechanisms, a standard assumption in causal learning, we find that interpolators cannot be optimal and causal learning requires stronger regularization than statistical learning. This resolves a recent conjecture in Jan:2019. Beyond this assumption, we find a larger range of behavior that can be precisely characterized with a new measure of confounding strength. If the confounding strength is negative, causal learning requires weaker regularization than statistical learning, interpolators can be optimal, and the optimal regularization can even be negative. If the confounding strength is large, the optimal regularization is infinite and learning from observational data is actively harmful.

keywords:

Causality, Interpolation, Double Descent, High-dimensional linear regression.

1 Introduction

We consider the problem of learning the causal relationship between multivariate covariates $x\in\mathbb{R}^{d}$ and a scalar target variable $y\in\mathbb{R}$ purely from observational data and possibly under the presence of hidden confounders. Formally, given finite samples $\left\{(x_{i},y_{i})\right\}_{i=1}^{n}$ drawn independently and identically (i.i.d) from the joint observational distribution $p(x,y)=p(x)p(y|x)$ , the goal of causal learning is to predict the effects on the target variable $y$ under interventions on the covariates $x$ . In other words, using Pearl’s notation