∎ \stackMath

¹¹institutetext: Tokyo Denki University ¹¹email: [email protected]

Use of Prior Knowledge to Discover Causal Additive Models with Unobserved Variables and its Application to Time Series Data

Takashi Nicholas Maeda Shohei Shimizu

(Received: date / Accepted: date)

Abstract

This paper proposes two methods for causal additive models with unobserved variables (CAM-UV). CAM-UV assumes that the causal functions take the form of generalized additive models and that latent confounders are present. First, we propose a method that leverages prior knowledge for efficient causal discovery. Then, we propose an extension of this method for inferring causality in time series data. The original CAM-UV algorithm differs from other existing causal function models in that it does not seek the causal order between observed variables, but rather aims to identify the causes for each observed variable. Therefore, the first proposed method in this paper utilizes prior knowledge, such as understanding that certain variables cannot be causes of specific others. Moreover, by incorporating the prior knowledge that causes precedes their effects in time, we extend the first algorithm to the second method for causal discovery in time series data. We validate the first proposed method by using simulated data to demonstrate that the accuracy of causal discovery increases as more prior knowledge is accumulated. Additionally, we test the second proposed method by comparing it with existing time series causal discovery methods, using both simulated data and real-world data.

Keywords:

Causal discovery Latent confounders Time-series data Causal additive models

1 Introduction

Causal discovery refers to a special class of statistical and machine learning methods that infer causal relationships. These studies propose inferential methods deductively derived from assumptions about the data generation process, and the methods enable us to create causal graphs between observed variables without additional experiments. The assumptions of existing causal methods include acyclicity of causal graphs, absence of latent confounders, and independence and identical distribution of exogenous variables Spirtes91 ; shimizu2006 ; shimizu2011 ; peters2014 ; NEURIPS2018_e347c514 . The methods have been applied to various types of data including economic data LAI2015173 , meteorological data ebert2012 , fMRI data SMITH2011875 .

This paper proposes a causal discovery method for time-series data assuming the presence of latent confounders. Most existing methods for time-series data assume the absence of a latent confounder JMLR:v9:chu08a ; JMLR:v11:hyvarinen10a . However, most data do not satisfy such assumption. A causal discovery method for time-series data, latent Peter-Clark momentary conditional independence (LPCMCI) NEURIPS2020_94e70705 , assumes the presence of latent confounders. However, since LPCMCI is a constraint-based method, it cannot distinguish causal structures that entail the same set of conditional independence between variables. This paper aims to propose a causal functional model-based method for time-series data assuming the presence of latent confounders. We extend the causal additive models with unobserved variables (CAM-UV) algorithm pmlr-v161-maeda21a ; maeda2021discovery to propose time-series CAM-UV (TS-CAM-UV), a method for causal discovery from time-series data with latent confounders. The original CAM-UV algorithm assumes that: (1) data are independently and identically distributed, (2) causal functions take the form of a generalized additive model of nonlinear functions, and (3) latent confounders are present. TS-CAM-UV, being a causal function model-based method, can identify causal relationships, provided the data fulfills its assumptions.

Causal discovery methods for time-series data represent the state of variable $X_{i}$ at time point $t$ as $X_{i}^{t}$ treating the states of $X_{i}$ at different points such as $X_{i}^{t},X_{i}^{t-1},\cdots,X_{i}^{s}$ as separate variables. This allows for representing causal relationships between variables at different time points.

Time series causal discovery methods can be described as causal discovery methods that utilize the prior knowledge that effects do not precede their causes in time. Therefore, before proposing the TS-CAM-UV algorithm, this paper proposes a method called CAM-UV with prior knowledge (CAM-UV-PK), which applies prior knowledge to CAM-UV. TS-CAM-UV is proposed as a method that introduces the knowledge that variables representing future states cannot be the cause of variables representing past states. To the best of our knowledge, this is the first method for time series causal discovery that adopts a causal function model approach assuming the presence of latent confounders.

The contributions of this paper are as follows:

•

This paper proposes a method called the CAM-UV-PK algorithm, which can introduce prior knowledge in the form of statements such as $X_{i}$ cannot be a cause of $X_{j}$ . The performance of the CAM-UV-PK algorithm is verified using simulation data.
•

We propose a time-series causal discovery method called the TS-CAM-UV algorithm, which applies the prior knowledge that variables representing future states cannot be causes of variables representing past states. The performance of the TS-CAM-UV algorithm is verified using both simulation data and real-world data.

The remainder of this paper comprises the following. Section 2 reviews previous studies on causal discovery methods for i.i.d. data and time-series data. Section 3 introduces the models of the data generation processes of CAM-UV and TS-CAM-UV, followed by Section 4 which shows the identifiability of those models. Section 5 introduces the two proposed methods, the CAM-UV-PK algorithm and the TS-CAM-UV algorithm. Section 6 shows and discusses the results of the experiments of the proposed methods. Section 7 brings the paper to a conclusion.

2 Related studies

Causal discovery methods often assume that the causal structures form directed acyclic graphs (DAGs), that there is no latent confounders, and that data are independently and identically distributed chickering2002 ; peters2014 ; shimizu2006 ; shimizu2011 ; Spirtes91 . The constrained-based methods including the Peter-Clark (PC) algorithm Spirtes91 and the fast causal inference (FCI) algorithm fci infer causal relationships on the basis of conditional independence in the joint distribution. FCI identifies the presence of latent confounders whereas PC assumes the absence of unobserved common causes. PC and FCI cannot distinguish between the two causal graphs that entail exactly the same sets of conditional independence. Compared to constrained-based methods, causal functional model-based methods can identify the entire causal models under proper assumptions. Linear non-Gaussian acyclic models (LiNGAM) shimizu2006 ; shimizu2011 assume that causal relationships are linear and the external effects are non-Gaussian. Additive noise models (ANMs) and causal additive models peters2014 assume the causal relationships are nonlinear. Both LiNGAM and ANMs assume the absence of unobserved variables. Causal additive models with unobserved variables (CAM-UV) pmlr-v161-maeda21a are extended models of causal additive models (CAMs) buhlmann2014 and assume that the causal functions take the form of generalized additive models (GAMs) hastie1990generalized and that unobserved variables are present.

Time-series causal discovery methods have been proposed as extensions of the above methods. The time-series FCI (tsFCI) algorithm entner2010causal and a structural vector autoregression FCI (SVAR-FCI) pmlr-v92-malinsky18a adapt FCI algorithm and use time order and stationarity to infer causal relationships. VAR-LiNGAM JMLR:v11:hyvarinen10a is based on LiNGAM and assumes the linearity of causal relationships, non-Gaussianity of external effects, and the absence of unobserved common causes. Time series models with independent noise (TiMINo) NIPS2013_47d1e990 adapts ANMs, and it assumes the absence of latent confounders. The Peter-Clark momentary conditional independence (PCMCI) algorithm doi:10.1126/sciadv.aau4996 is an adaptation of the conditional independence-based PC algorithm that addresses strong autocorrelations in time series via the use of a momentary conditional independence (MCI) test. Latent PCMCI (LPCMCI) NEURIPS2020_94e70705 is an extension of PCMCI to include unobserved variables. However, to the best of our knowledge, no causal functional model-based method has been proposed for time-series data under the assumption that causal relationships are nonlinear and latent confounders are present.

3 Models

3.1 CAM-UV: Causal additive models with unobserved variables

Causal additive noise models with unobserved variables (CAM-UV) pmlr-v161-maeda21a ; maeda2021discovery are defined as the equation below:

V_{i}=\sum_{X_{j}\in opa(V_{i})}f_{i,j}(X_{j})+\sum_{u_{j}\in upa(U_{i})}f_{i,j}(U_{j})+N_{i},

(1)

where $V=\{V_{i}\}$ is the set of observed or unobserved variables, $X=\{X_{i}\}$ the set of observed variables, $U=\{U_{i}\}$ is the set of unobserved variables, $N_{i}$ is the external effect on $V_{i}$ , $opa(V_{i})\subset X$ is the set of observed direct causes (observed parents) of $V_{i}$ , $upa(V_{i})\subset U$ is the set of unobserved direct causes (unobserved parents) of $V_{i}$ , and $f_{i,j}$ is a nonlinear function. Additionally, Assumption 1 is imposed on CAM-UV.

Assumption 1.

All the causal functions and the external effects in CAM-UV satisfy the following condition: If variables $V_{i}$ and $V_{j}$ have terms involving functions of the same external effect $N_{k}$ , then $V_{i}$ and $V_{j}$ are mutually dependent (i.e., $(N_{k}\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}V_{i})\land(N_{k}\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}V_{j})\Rightarrow(V_{i}\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}V_{j})$ ).

3.2 TS-CAM-UV: Time series causal additive models with unobserved variables

Time-series causal additive noise models with unobserved variables (TS-CAM-UV) are stationary discrete-time structural causal models that can be described as below:

V_{i}^{t}=\sum_{X_{j}^{s}\in opa(V_{i}^{t})}f_{i,t,j,s}(X_{j}^{s})+\sum_{U_{j}^{s}\in upa(U_{i}^{t})}f_{i,t,j,s}(U_{j}^{s})+N_{i}^{t}\ \ \ {\rm with}\ i=1,\dots,m,

(2)

where $t$ and $s$ are time indices, $m$ is a natural number, $V=\{V_{i}^{t}\}$ is the set of observed or unobserved variables, $X=\{X_{i}^{t}\}$ is the set of observed variables, $U=\{U_{i}^{t}\}$ is the set of unobserved variables, $f_{i,t,j,s}$ is a nonlinear function, the noise variables $N_{i}^{t}$ are jointly independent, $opa(V_{i}^{t})\subset X$ is the set of direct causes of $V_{j}^{t}$ , and $upa(V_{i}^{t})\subset U$ is the set of direct causes of $V_{j}^{t}$ .

The stationarity of time-series causal relationships is assumed as the following: The causal relationship of the variable pair $(V^{t-\epsilon}_{i},V^{t}_{j})$ is the same as that of all the time shifted pairs $(V^{t^{\prime}-\epsilon}_{i},V^{t^{\prime}}_{j})$ . The causal effect of $V^{s}_{j}$ on $V^{t}_{i}$ is called a lagged effect if $s<t$ holds, and is called a contemporaneous effect if $t=s$ holds. It is also assumed that there is a natural number $r$ as the maximum time lag such that the longest time lag of the direct causal effects does not exceed $r$ . While it is true that the cause precedes the effect in time, if the time slice of the data analyzed are not sufficiently short, the cause and effect may appear to occur simultaneously. This type of causal effect, where the time difference between the cause and effect is shorter than the time slice of data, is referred to as a contemporaneous effect.

Refer to caption — Figure 1: Definitions of an unobserved causal path (UCP) and an unobserved backdoor path (UBP).

4 Identifiability

4.1 CAM-UV

The identifiability of CAM-UV is discussed in pmlr-v161-maeda21a ; maeda2021discovery , and this section briefly presents it. When the causal relationship is linear, an observed variable $X_{j}$ being an indirect cause of an observed variable $X_{i}$ , even if there is an unobserved variable $U_{k}$ in the causal path such that the causal relationship is $X_{j}\rightarrow U_{k}\rightarrow X_{i}$ , the residual when regressing $X_{i}$ on $X_{j}$ becomes independent of $X_{j}$ . However, in the case of a non-linear causal relationship, the residual when regressing $X_{i}$ on $X_{j}$ cannot be independent of $X_{j}$ . This is referred to as cascade additive noise models (CANMs) ijcai2019p223 . Therefore, in the case of non-linear causal relationships, compared to linear ones, there are more instances where causal relationships cannot be identified using only regression and independence tests. Before discussing the cases where causal relationships cannot be identified in CAM-UV, we define unobserved causal paths (UCPs) and unobserved backdoor paths (UBPs) which are illustrated in Fig. 1 and used in the lemmas in this section.

Definition 1.

A directed path from an observed variable to another is called a causal path (CP). A CP from $X_{j}$ to $X_{i}$ is called an unobserved causal path (UCP) if it ends with the directed edge connecting $X_{i}$ and its unobserved direct cause (i.e., $X_{j}\rightarrow\cdots\rightarrow U_{m}\rightarrow X_{i}$ where $U_{m}$ is an unobserved direct cause of $X_{i}$ ).

Definition 2.

An undirected path between $X_{i}$ and $X_{j}$ is called a backdoor path (BP) if it consists of the two directed paths from a common ancestor of $X_{i}$ and $X_{j}$ to $X_{i}$ and $X_{j}$ (i.e., $X_{i}\leftarrow\cdots\leftarrow V_{k}\rightarrow\cdots\rightarrow X_{j}$ , where $V_{k}$ is the common ancestor). A BP between $X_{i}$ and $X_{j}$ is called an unobserved backdoor path (UBP) if it starts with the edge connecting $X_{i}$ and its unobserved direct cause, and ends with the edge connecting $X_{j}$ and its unobserved direct cause (i.e., $X_{i}\leftarrow U_{m}\leftarrow\cdots\leftarrow V_{k}\rightarrow\cdots\rightarrow U_{n}\rightarrow X_{j}$ , where $V_{k}$ is the common ancestor and $U_{m}$ and $U_{n}$ are the unobserved direct causes of $X_{i}$ and $X_{j}$ , respectively). The undirected path $X_{i}\leftarrow U_{k}\rightarrow X_{j}$ is also a UBP, as $V_{k}$ , $U_{m}$ , and $U_{n}$ can be the same variable.

The identifiability of CAM-UV is based on Lemmas 1–3 shown below. They show that it is possible to identify the direct causal relationship between two variables if they do not have a UCP or a UBP, otherwise it is impossible to identify the direct direct causal relationship but possible to identify the presence of a UCP or a UBP. This is due to the fact that when the causal relationship is non-linear, if the parent of an observed variable $X_{i}$ is an unobserved variable $U_{j}$ , the ancestral variables of $U_{j}$ cannot be removed from $X_{i}$ by regression. Lemma 1 is about the condition of variable pair $(X_{i},X_{j})$ having a UCP or a UBP. Lemma 2 is about the condition of variable pair $(X_{i},X_{j})$ not having a UBP, a UCP, or a direct causal relationship. Lemma 3 is about the condition that $X_{j}$ is a direct cause of $X_{i}$ , and they do not have a UCP or a UBP. Please refer to maeda2021discovery for the proofs of the lemmas.

Lemma 1.

Assume the data generation process of the variables is CAM-UV as defined in Section 3.1. If and only if Equation LABEL:eq:lem1 is satisfied, there is a UCP or UBP between $X_{i}$ and $X_{j}$ where $G_{1}$ and $G_{2}$ denote regression functions satisfying Assumption 2.

\displaystyle\begin{aligned} &\forall G_{1},G_{2},M_{1}\subseteq(X\setminus\{X_{i}\}),M_{2}\subseteq(X\setminus\{X_{j}\}),\\ &\left[\left(X_{i}-G_{1}(M_{1})\right)\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}\left(X_{j}-G_{2}(M_{2})\right)\right]\end{aligned}

(3)

Equation LABEL:eq:lem1 indicates that the residual of $X_{i}$ regressed on any subset of $X\setminus\{X_{i}\}$ and the residual of $X_{j}$ regressed on any subset of $X\setminus\{X_{j}\}$ cannot be mutually independent.

Assumption 2.

Let $M_{1}$ and $M_{2}$ denote sets satisfying $M_{1}\subseteq X$ and $M_{2}\subseteq X$ where $X$ is the set of all the observed variables in CAM-UV defined in Section 2. We assume that functions $G_{i}$ take the forms of generalized additive models (GAMs) hastie1990generalized such that $G_{i}(M_{1})=\sum_{X_{m}\in M_{1}}g_{i,m}(X_{m})$ where each $g_{i,m}(X_{m})$ is a nonlinear function of $X_{m}$ . In addition, we assume that functions $G_{i}$ satisfy the following condition: When both $(X_{i}-G_{i}(M_{1}))$ and $(X_{j}-G_{j}(M_{2}))$ have terms involving functions of the same external effect $N_{k}$ , then $(X_{i}-G_{i}(M_{1}))$ and $(X_{j}-G_{j}(M_{2}))$ are mutually dependent (i.e., $(N_{k}\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}X_{i}-G_{i}(M_{1}))\land(N_{k}\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}X_{j}-G_{j}(M_{2}))\Rightarrow((X_{i}-G_{i}(M_{1}))\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}(X_{j}-G_{j}(M_{2})))$ ).

Lemma 2.

Assume the data generation process of the variables is CAM-UV as defined in Section 3.1. If and only if Equation LABEL:eq:lem2 is satisfied, there is no direct causal relationship between $X_{i}$ and $X_{j}$ , and there is no UCP or UBP between $X_{i}$ and $X_{j}$ where $G_{1}$ and $G_{2}$ denote regression functions satisfying Assumption 2.

\displaystyle\begin{aligned} &\exists G_{1},G_{2},M\subseteq(X\setminus\{X_{i},X_{j}\}),N\subseteq(X\setminus\{X_{i},X_{j}\}),\\ &[(\left(X_{i}-G_{1}(M)\right)\mathop{\perp\!\!\!\perp}\left(X_{j}-G_{2}(N)\right))]\end{aligned}

(4)

Equation LABEL:eq:lem2 indicates that there are regression functions such that the residuals of $X_{i}$ and $X_{j}$ regressed on subsets of $X\setminus\{X_{i},X_{j}\}$ are mutually independent.

Lemma 3.

Assume the data generation process of the variables is CAM-UV as defined in Section 3.1. If and only if Equations LABEL:eq:lem3-1 and LABEL:eq:lem3-2 are satisfied, $X_{j}$ is a direct cause of $X_{i}$ , and there is no UCP or UBP between $X_{i}$ and $X_{j}$ where $G_{1}$ and $G_{2}$ denote regression functions satisfying Assumption 2.

\displaystyle\begin{aligned} &\forall G_{1},G_{2},M\subseteq(X\setminus\{X_{i},X_{j}\}),N\subseteq(X\setminus\{X_{j}\}),\\ &\left[\left(X_{i}-G_{1}(M)\right)\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}\left(X_{j}-G_{2}(N)\right)\right]\end{aligned}

(5)

\displaystyle\begin{aligned} &\exists G_{1},G_{2},M\subseteq(X\setminus\{X_{i}\}),N\subseteq(X\setminus\{X_{i},X_{j}\}),\\ &\left[\left(X_{i}-G_{1}(M)\right)\mathop{\perp\!\!\!\perp}\left(X_{j}-G_{2}(N)\right)\right]\end{aligned}

(6)

Equation LABEL:eq:lem3-1 indicates that the residual of $X_{i}$ regressed on any subset of $X\setminus\{X_{i},X_{j}\}$ and the residual of $X_{j}$ regressed on any subset of $X\setminus\{X_{j}\}$ cannot be mutually independent. Equation LABEL:eq:lem3-2 indicates that there are regression functions such that the residual of $X_{i}$ regressed on a subset of $X\setminus\{X_{j}\}$ and the residual of $X_{j}$ regressed on a subset of $X\setminus\{X_{i},X_{j}\}$ are mutually independent.

4.2 TS-CAM-UV

The identifiability of causality in TS-CAM-UV is the same as in CAM-UV. Lemmas 4–6 on identifiability in TS-CAM-UV correspond to Lemmas 1–3 on identifiability in CAM-UV.

Lemma 4.

Assume the data generation process of the variables is TS-CAM-UV as defined in Section 3.2. If and only if Equation LABEL:eq:lem-ts1 is satisfied, there is a UCP or UBP between $X_{i}^{t}$ and $X_{j}^{s}$ where $G_{1}$ and $G_{2}$ denote regression functions satisfying Assumption 2.

\displaystyle\begin{aligned} &\forall G_{1},G_{2},M\subseteq(X\setminus\{X_{i}^{t}\}),N\subseteq(X\setminus\{X_{j}^{s}\}),\\ &\left[\left(X_{i}^{t}-G_{1}(M)\right)\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}\left(X_{j}^{s}-G_{2}(N)\right)\right]\end{aligned}

(7)

Proof.

The relationships between $X_{i}^{t}$ and $X_{j}^{s}$ in TS-CAM-UV are the same as those of $X_{i}$ and $X_{j}$ in CAM-UV defined in Section 3.1. Therefore, Lemma 4 holds because of Lemma 1. ∎

Lemma 5.

Assume the data generation process of the variables is TS-CAM-UV as defined in Section 3.2. If and only if Equation LABEL:eq:lem-ts2 is satisfied, there is no direct causal relationship between $X_{i}^{t}$ and $X_{j}^{s}$ , and there is no UCP or UBP between $X_{i}^{t}$ and $X_{j}^{s}$ where $G_{1}$ and $G_{2}$ denote regression functions satisfying Assumption 2.

\displaystyle\begin{aligned} &\exists G_{1},G_{2},M\subseteq(X\setminus\{X_{i}^{t},X_{j}^{s}\}),N\subseteq(X\setminus\{X_{i}^{t},X_{j}^{s}\}),\\ &[(\left(X_{i}^{t}-G_{1}(M)\right)\mathop{\perp\!\!\!\perp}\left(X_{j}^{s}-G_{2}(N)\right))]\end{aligned}

(8)

Proof.

The relationships between $X_{i}^{t}$ and $X_{j}^{s}$ in TS-CAM-UV are the same as those of $X_{i}$ and $X_{j}$ in CAM-UV defined in Section 3.1. Therefore, Lemma 5 holds because of Lemma 2. ∎

Lemma 6.

Assume the data generation process of the variables is TS-CAM-UV as defined in Section 3.2. If and only if Equations LABEL:eq:lem-ts3-1 and LABEL:eq:lem-ts3-2 are satisfied, $X_{j}^{s}$ is a direct cause of $X_{i}^{t}$ , and there is no UCP or UBP between $X_{i}^{t}$ and $X_{j}^{s}$ where $G_{1}$ and $G_{2}$ denote regression functions satisfying Assumption 2.

\displaystyle\begin{aligned} &\forall G_{1},G_{2},M\subseteq(X\setminus\{X_{i}^{t},X_{j}^{s}\}),N\subseteq(X\setminus\{X_{j}^{t}\}),\\ &\left[\left(X_{i}^{t}-G_{1}(M)\right)\mathop{\perp\!\!\!\!\!\!/\!\!\!\!\!\!\perp}\left(X_{j}^{s}-G_{2}(N)\right)\right]\end{aligned}

(9)

\displaystyle\begin{aligned} &\exists G_{1},G_{2},M\subseteq(X\setminus\{X_{i}^{t}\}),N\subseteq(X\setminus\{X_{i}^{t},X_{j}^{s}\}),\\ &\left[\left(X_{i}^{t}-G_{1}(M)\right)\mathop{\perp\!\!\!\perp}\left(X_{j}^{s}-G_{2}(N)\right)\right]\end{aligned}

(10)

Proof.

The relationships between $X_{i}^{t}$ and $X_{j}^{s}$ in TS-CAM-UV are the same as those of $X_{i}$ and $X_{j}$ in CAM-UV defined in Section 3.1. Therefore, Lemma 6 holds because of Lemma 3. ∎

5 Methods

5.1 CAM-UV-PK: Causal additive models with unobserved variables using prior knowledge

This section proposes a method called CAM-UV using prior knowledge (CAM-UV-PK). This method is for discovering causal additive models with unobserved models defined in Section 3.1. In addition to the arguments of the CAM-UV algorithm, the CAM-UV-PK algorithm requires an argument ${\mathbf{T}}$ that is a list of ordered variable pairs. If a ordered variable pair $(X_{i},X_{j})$ is included in ${\mathbf{T}}$ , it means that it is assumed that $X_{i}$ cannot be a direct or indirect cause of $X_{j}$ .

The CAM-UV algorithm and CAM-UV-PK algorithm output causal graphs with directed edges and undirected dashed edges. Directed edges indicate variable pairs having direct causal relationships, and undirected dashed edges indicate variable pairs having UCPs or UBPs. For example, Figure 2-(a) shows a true causal graph, and Figure 2-(b) shows the causal graph generated by the CAM-UV algorithm. $X_{2}$ and $X_{3}$ have a UBP ( $X_{2}\leftarrow U_{1}\rightarrow X_{3}$ ), so they are connected with an undirected dashed path in Figure 2-(b). $X_{4}$ and $X_{9}$ have a UCP ( $X_{4}\rightarrow U_{7}\rightarrow X_{9}$ ), so they are also connected with an undirected dashed path in Figure 2-(b).

Input:

{\mathbf{X}}

: Samples of

p

observed variables

\{X_{1},\cdots,X_{p}\}

{\mathbf{T}}

: A list of ordered pairs of variables where it is assumed that the first variable cannot be the direct or indirect cause of the second variable,

d

: maximal number of variables to examine causality for each step, significance level for independence test

\alpha

Output: the sets of the parents

\{M_{1},\cdots,M_{p}\}

1 function

\text{getDirectedEdges}{(}X,d,\alpha{)}

2 # PHASE 1: Extracting the candidates of the parents of each variable.

3 for $i=1$ to $p$ do

4 Initialize

M_{i}\leftarrow\emptyset

6 Initialize

t\leftarrow 2

7 while $t\leq d$ do

8 Initialize

noChange\leftarrow{\rm True}

9 foreach $K\in\{K|K\subseteq X,|K|=t\}$ do

10 # Finding the most endogenous variable

X_{b}

in K

11 foreach $K\in\{K|K\subseteq X,|K|=t\}$ do

maxIndependence\leftarrow 0

maxIndependenceVariable\leftarrow{\rm NULL}

14 foreach $X_{i}\in K$ do

15 # Checking whether there exists variable

X_{j}\in K\setminus\{X_{i}\}

that cannot be a cause of

X_{i}

according to prior knowledge

{\mathbf{T}}

16 if $\exists X_{j}\in K\setminus\{X_{i}\},\ \ [(X_{j},X_{i})\in{\bf T}]$ then

17 continue

indepe\leftarrow\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{\text{p-HSIC}}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{\text{p-HSIC}}{\tmpbox}(X_{i}-G_{1}(M_{i}\cup K\setminus\{X_{i}\}),\{X_{j}-G_{2}(M_{j})|X_{j}\in K\setminus\{X_{i}\}\})

20 if $maxIndependence<indepe$ then

maxIndependence\leftarrow indepe

maxIndependenceVariable\leftarrow X_{i}

X_{b}\leftarrow maxIndependenceVariable

27 # Computing the independence between the residuals

e\leftarrow\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{\text{p-HSIC}}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{\text{p-HSIC}}{\tmpbox}(X_{b}-G_{1}(M_{b}\cup K\setminus\{X_{b}\}),\{X_{j}-G_{2}(M_{j})|X_{j}\in K\setminus\{X_{b}\}\})

h\leftarrow\displaystyle\max_{x_{j}\in K\setminus\{X_{b}\}}\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{\text{p-HSIC}}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{\text{p-HSIC}}{\tmpbox}(X_{b}-G_{1}(M_{b}),X_{j}-G_{2}(M_{j}))

30 # Checking whether

X_{b}

is really a sink of

K

31 if $(\alpha<e)\land(\alpha>h)$ then

32 # When

X_{b}

is a sink of

K

, add each variable in

K\setminus\{X_{b}\}

M_{b}

M_{b}\leftarrow M_{b}\cup(K\setminus\{X_{b}\})

noChange\leftarrow{\rm False}

37 # If each

M_{i}

remains unchanged, increment

t

by one. If not, substitute

2

for

t

38 if $noChange={\rm True}$ then

t\leftarrow t+1

41 else

t\leftarrow 2

45 # PHASE 2: Determining the parents of each variable.

46 for $i=1$ to $p$ do

47 foreach $X_{j}\in M_{i}$ do

48 # Checking whether

X_{j}

is parent of

X_{i}

49 if $\alpha<\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{\text{\rm p-HSIC}}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{\text{\rm p-HSIC}}{\tmpbox}(X_{i}-G_{1}(M_{i}\setminus\{X_{j}\}),X_{j}-G_{2}(M_{j}))$ then

50 # When

X_{j}

is not a parent, remove it from

M_{i}

M_{i}\leftarrow M_{i}\setminus\{X_{j}\}

55 return $\{M_{1},\cdots,M_{p}\}$

Algorithm 1 Determine the directed edges

The CAM-UV-PK algorithm incorporates restriction using prior knowledge ${\mathbf{T}}$ into the process of causal inference in the CAM-UV algorithm. The CAM-UV algorithm has two-step algorithm pmlr-v161-maeda21a ; maeda2021discovery . The first step determines the directed edges, and the second one determines the undirected dashed edges. There is no difference in the second step between the CAM-UV-PK algorithm and the CAM-UV algorithm. The first step of the CAM-UV-PK algorithm is listed in Algorithm 1. Lines 14–16 in Algorithm 1 are added to the CAM-UV algorithm. This part of the algorithm refers to the prior knowledge ${\mathbf{T}}$ to avoid considering unnecessary causal candidates. The method extracts the candidates of the direct causes (parents) of each variable (lines 2–34) and determines the direct causes of each variables (lines 35–41). The method identifies the most endogenous variable $X_{b}$ in each $K\in\{K|K\subseteq X,|K|=t\}$ . When $X_{i}=x_{b}$ is satisfied, $X_{i}$ maximizes $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{\text{p-HSIC}}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{\text{p-HSIC}}{\tmpbox}(X_{i}-G_{1}(M_{i}\cup K\setminus\{X_{i}\}),\{X_{j}-G_{2}(M_{j})|X_{j}\in K\setminus\{X_{i}\}\})$ . In lines 14–16 which are newly added in CAM-UV-PK, the method checks whether there exists $X_{j}\in K\setminus\{X_{i}\}$ that cannot be a direct or indirect cause of $X_{i}$ according to the prior knowledge ${\mathbf{T}}$ . If $(X_{j},X_{i})\in{\mathbf{T}}$ is satisfied, the method stops checking whether $X_{i}$ is endogenous to $K\setminus\{X_{i}\}$ . Therefore, this check prevents incorrect inference of causal relationships.

5.2 TS-CAM-UV: Time series causal additive models with unobserved variables

This section proposes a method called the time-series CAM-UV (TS-CAM-UV) algorithm. The TS-CAM-UV algorithm uses as prior knowledge the assumption, called time priority, that effect does not precede its cause in time. The TS-CAM-UV algorithm uses the CAM-UV-PK algorithm, and the prior knowledge of time priority is used for the argument of the CAM-UV-PK, ${\mathbf{T}}$ .

The TS-CAM-UV algorithm first creates data with $q\times(r+1)$ variables where $q$ is the number of the variables of original data, and $r$ is the maximal considered time lag given as an argument. Let ${\mathbf{X}_{t}}=\{X^{t}_{1},\cdots,X^{t}_{q}\}$ denote the variables in original data. The TS-CAM-UV algorithm creates data with variables $\mathbf{X}^{\rm new}=\{X^{t}_{1},\cdots,X^{t}_{q},X^{t-1}_{1},\cdots,X^{t-1}_{q},\cdots,X^{t-r}_{1},\cdots,X^{t-r}_{q}\}$ . Therefore, the number of the new samples decreases to $n-r$ where $n$ denotes the number of original data. Then, the TS-CAM-UV algorithm creates a list of ordered variables $K=\{(X^{t}_{i},X^{t^{\prime}}_{j})|t>t^{\prime},0\leq i\leq q,0\leq j\leq q,\}$ .

The TS-CAM-UV algorithm uses $\mathbf{X}^{\rm new}$ and $K$ for the arguments of CAM-UV-PK ${\mathbf{X}}$ and ${\mathbf{T}}$ , respectively. Then, CAM-UV-PK outputs a causal graph of the $q$ variables with $r$ time lag.

6 Experiments

We conducted experiments to examine the performance of the CAM-UV-PK algorithm and the TS-CAM-UV algorithm. The CAM-UV-PK algorithm is compared with that of CAM-UV. The TS-CAM-UV algorithm is compared with VarLiNGAM and LPCMCI. Here, we primarily compare the accuracy of directed edges. This is because, in other methods, there are no approaches that consider the effects of unobserved intermediate variables (unobserved variables on the causal paths between observed variables), and also because CAM-UV aims to ensure that the inference of directed edges is not biased due to latent confounders.

6.1 CAM-UV-PK: Causal additive models with unobserved variables using prior knowledge

We examined the performance of CAM-UV-PK compared to CAM-UV using simulated data. We compared and evaluated the performance of CAM-UV-PK with prior knowledge ranging from 0 to 4. The CAM-UV algorithm is the same as the CAM-UV-PK algorithm with no input of prior knowledge. We performed 100 experiments using artificial data with each sample size $n\in\{100,200,\cdots,900,1000\}$ to compare our method to existing methods. In each experiment, the samples are created as follows:

•

The number of observed variables is 10.
•

The number of the observed variable pairs having unobserved common causes is 4.
•

The number of observed variable pairs having unobserved causal intermediate variables is 2.
•

The number of the observed variable pairs having direct causal effects is 10.
•

Variable pairs having unobserved common causes, unobserved intermediate causal variables, or direct causal relationships were randomly selected under the restriction that the set of variable pairs with unobserved common causes, the set of variable pairs with unobserved intermediate causal variables, and the set of variable pairs with direct causal relationships were mutually disjoint.

•

The causal effect of $V_{j}$ on $V_{i}$ is determined as follows:

\left(\sin\left(a_{1}\left(V_{j}+b_{1}\right)\right)\right)^{3}c_{1}+\left(\frac{1}{1+\exp(-a_{2}(V_{j}+b_{2}))}-0.5\right)c_{2}

(11)

where $a_{1}$ , $a_{2}$ , $b_{1}$ , $b_{2}$ , $c_{1}$ , and $c_{1}$ are constants that take random value for each $(i,j)$ . Constants $a_{1}$ and $a_{2}$ are taken from $U(9,11)$ , $b_{1}$ and $b_{2}$ are taken from $U(-0.1,0.1)$ , and $c_{1}$ and $c_{2}$ are taken from $U(3,5)$ . This function is also used in experiments to validate the TS-CAM-UV algorithm in the next section so that causal effects do not converge or diverge over time.

The arguments of TS-CAM-UV, $\alpha$ (significance level for independence test) and $d$ (maximal number of variables to examine causality for each step) are set to 0.01 and 2, respectively.

We compared the performance of the identification of direct causal relationships. We used precision, recall, and F-measure as the evaluation measures. True positive (TP) is the number of true directed edges that a method correctly infers in terms of their positions and directions. Precision represents the TP divided by the number of estimations, and recall represents the TP divided by the number of all the true directed edges. Furthermore, F-measure is defined as $\text{F-measure}=2\cdot\text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$ . In each experiment, out of the ten variable pairs with direct causal relationships, four were excluded from the evaluation. These four causal relationships were used as prior knowledge in CAM-UV-PK.

Fig. 3 shows the results of the identification of direct causal relationships. The figure plots the average of precision, recall, and F-measure. It can be seen that precision and F-measure increase with the number of prior knowledge. The CAM-UV algorithm is the CAM-UV-PK algorithm without prior knowledge, and this has the lowest precision and F-measure. The number of prior knowledge does not significantly affect recall.

The above experimental results of the CAM-UV-PK algorithm confirm that the number of prior knowledge improves the precision and F-measure of the identification of direct causal relationships.

6.2 TS-CAM-UV: Time series causal additive models with unobserved variables

We examined the performance of TS-CAM-UV compared to LPCMCI and VarLiNGAM using simulated data and real-world data. For LPCMCI, two methods of conditional independence test were used for the comparison: Partial correlation test (ParCorr) and Gaussian process regression and a distance correlation test on the residuals (GPDC). ParCorr assumes linear additive noise models, and GPDC assumes nonlinear additive noise models.

6.2.1 Simulated data

We performed 100 experiments using artificial data with each sample size $n\in\{100,200,\cdots,1900,2000\}$ to compare our method to existing methods. In each experiment, the samples are created as follows:

•

The number of observed variables and the maximum time lag are 3 and 2, respectively. Therefore, the number of the variables representing different time lags of all the observed variables is 9 (e.g. $|\{X_{i}^{t}\}|=9$ ).
•

The number of observed variable pairs having unobserved common causes is 2.
•

The number of observed variable pairs having unobserved intermediate variables is 2.
•

The number of observed variable pairs having direct causal relationships is 5.
•

Variable pairs having unobserved common causes, unobserved intermediate causal variables, or direct causal relationships were randomly selected under the restriction that the set of variable pairs with unobserved common causes, the set of variable pairs with unobserved intermediate causal variables, and the set of variable pairs with direct causal relationships were mutually disjoint.

•

The causal effect of $V^{s}_{j}$ on $V^{t}_{i}$ is determined as below:

\left(\sin\left(a_{1}\left(V_{s}^{j}+b_{1}\right)\right)\right)^{3}c_{1}+\left(\frac{1}{1+\exp(-a_{2}(V_{s}^{j}+b_{2}))}-0.5\right)c_{2}

(12)

where $a_{1}$ , $a_{2}$ , $b_{1}$ , $b_{2}$ , $c_{1}$ , and $c_{1}$ are constants that take random value for each $(i,j,t,t^{\prime})$ . Constants $a_{1}$ and $a_{2}$ are taken from $U(9,11)$ , $b_{1}$ and $b_{2}$ are taken from $U(-0.1,0.1)$ , and $c_{1}$ and $c_{2}$ are taken from $U(3,5)$ .

In this experiment, we compared the performance of the identification of direct causal relationships. That is, the edges with arrows in causal graphs ( $\rightarrow$ ).

The arguments of the TS-CAM-UV algorithm, VarLiNGAM, and LPCMCI were set as follows:

•
TS-CAM-UV
- $\circ$
  
  Significance level for independence test: 0.01.
- $\circ$
  
  Maximal number of causal variables to examine causality for each step: 2.
- $\circ$
  
  Maximal number of time lags: 2.
•
VarLiNGAM
- $\circ$
  
  Maximal number of time lags: 2.
- $\circ$
  
  Threshold value for the strength of the causal effects (i.e. the absolute values of coefficients): 0.01, 0.05, 0.1, and 0.5.
•
LPCMCI
- $\circ$
  
  Significance level for independence test: 0.01.
- $\circ$
  
  Maximal number of time lags: 2.
- $\circ$
  
  Methods of conditional independence test: GPDC and ParCorr.

The results are shown in Fig. 4. The figure plots the average of precision, recall, and F-measure. The values in the brackets for VarLiNGAM indicate threshold values for the strength of causal effects. TS-CAM-UV showed the highest precision for $n\geq 200$ , the highest recall for $n\geq 1200$ , and the highest F-measure for $n\geq 600$ compared to other methods.

6.2.2 Real world data

We also conducted an experiment using official foreign exchange quotation data for the Japanese yen at Mizuho Bank¹¹1Mizuho Bank: https://www.mizuhobank.co.jp/market/historical.html (in Japanese).. The data consist of daily quotes for USD, GBP, EUR, CHF, and CAD from the 26th October 2021 to the 8th November 2023. The total sample size is 500.

We set the maximal lag length of every method to 1. The threshold value for causal effects for VarLiNGAM was set to 0.1 which gave the best result in experiments using simulated data shown in Section 6.2.1. All other arguments were kept the same as in Section 6.2.1.

Fig. 5 shows the results: (a) The causal graph only with directed edges generated from TS-CAM-UV, (b) the causal graph with edges other than directed edges generated from TS-CAM-UV, (c) the causal graph only with directed edges generated from LPCMCI using ParCorr, (d) the causal graph with edges other than directed edges generated from LPCMCI using ParCorr, (e) the causal graph only with directed edges generated from LPCMCI using GPDC, (f) the causal graph with edges other than directed edges generated from LPCMCI using GPDC, and (g) the causal graph only with directed edges generated from VarLiNGAM. The dashed lines in Fig. 5-(b) show the variable pairs estimated to have UBPs or UCPs.

We do not compare the performance of the methods based on the results because there is no ground truth for the relationships among the variables. We compare TS-CAM-UV with other methods to see what kind and how many variable pairs are connected. Figure 5-(c) shows that LPCMCI (ParCorr) draws an edge from the variable representing the state at time $t-1$ of each currency to the variable representing the state at time $t$ of the currency (e.g. $X_{t-1}^{i}\rightarrow X_{t}^{i}$ ), but does not draw edges between variables of different currencies. Compared to this, Figure 5-(a) shows that TS-CAM-UV connects the variables of different currencies with directed edges. This may be due to the fact that ParCorr assumes linear causal relationships. The causal relationship between the previous and current values of the same currency may be linear, while other causal relationships may be nonlinear. Figure 5-(e) shows LPCMCI (GPDC) connects the variables of different currencies with directed edges. The number of variable pairs connected by LPCMCI (GPDC) is less than the number of variable pairs connected by TS-CAM-UV. This may be due to the fact that LPCMCI is a constraint-based method and cannot distinguish between all graphs with the same set of conditional independence between observed variables. Figure 5-(g) shows that VarLiNGAM connects more variable pairs with directed edges than TS-CAM-UV. This may be due to the fact that VarLiNGAM assumes the absence of latent confounders.

To summarize, the TS-CAM-UV algorithm is based on a causal functional model, which enables it to identify the direction of causality in variable pairs that LPCMCI could not orient. Furthermore, by assuming the presence of unobserved variables, it can avoid incorrect orientations, similar to what occurs with VarLiNGAM.

7 Conclusion

In this paper, we propose two methods as extensions of CAM-UV: CAM-UV-PK and TS-CAM-UV. The CAM-UV-PK algorithm employs a method that introduces prior knowledge in the form that a certain variable is not a cause of a certain other variable. This is based on the CAM-UV algorithm, which infers causal variables for each observed variable. TS-CAM-UV uses time priority as prior knowledge for CAM-UV-PK, indicating that variables occurring later in time cannot be the cause of earlier variables. To the best of our knowledge, this is the first method for time series causal discovery that adopts a causal function model approach assuming the presence of latent confounders. If the data being analyzed satisfy the assumption that the causal function takes the form of a generalized additive model, then this proposed method can accurately infers causal relationships even in the presence of latent confounders.

Future research will extend our approach to models where the causal graph contains cycles. If the time for the causal effect from the cause variable to the effect variable is shorter than the time slice of the data being analyzed, this causal effect becomes a contemporaneous effect. When there is a causal relationship such as $X_{i}^{t-2}\rightarrow X_{j}^{t-1}\rightarrow X_{i}^{t}$ , and the time slice of the data is longer than this causal effect, it results in a contemporaneous effect with cycles. Therefore, future research explore causal discovery methods that allow for cycles in contemporaneous effects.

References

(1) Bühlmann, P., Peters, J., and Ernest, J. CAM: Causal additive models, high-dimensional order search and penalized regression. Annals of Statistics 42, 6 (2014), 2526–2556.
(2) Cai, R., Qiao, J., Zhang, K., Zhang, Z., and Hao, Z. Causal discovery with cascade nonlinear additive noise model. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19 (7 2019), International Joint Conferences on Artificial Intelligence Organization, pp. 1609–1615.
(3) Chickering, D. M. Optimal structure identification with greedy search. Journal of Machine Learning Research 3, Nov (2002), 507–554.
(4) Chu, T., and Glymour, C. Search for additive nonlinear time series causal models. Journal of Machine Learning Research 9, 32 (2008), 967–991.
(5) Ebert-Uphoff, I., and Deng, Y. Causal discovery for climate research using graphical models. Journal of Climate 25, 17 (2012), 5648 – 5665.
(6) Entner, D., and Hoyer, P. O. On causal discovery from time series data using FCI. In Proceedings of the 5th European Workshop on Probabilistic Graphical Models (2010).
(7) Gerhardus, A., and Runge, J. High-recall causal discovery for autocorrelated time series with latent confounders. In Advances in Neural Information Processing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 12615–12625.
(8) Hastie, T. J., and Tibshirani, R. J. Generalized additive models. Chapman and Hall/CRC, 1990.
(9) Hyvärinen, A., Zhang, K., Shimizu, S., and Hoyer, P. O. Estimation of a structural vector autoregression model using non-gaussianity. Journal of Machine Learning Research 11, 56 (2010), 1709–1731.
(10) Lai, P.-C., and Bessler, D. A. Price discovery between carbonated soft drink manufacturers and retailers: A disaggregate analysis with PC and LiNGAM algorithms. Journal of Applied Economics 18, 1 (2015), 173–197.
(11) Maeda, T. N., and Shimizu, S. Causal additive models with unobserved variables. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence (2021), C. de Campos and M. H. Maathuis, Eds., vol. 161 of Proceedings of Machine Learning Research, PMLR, pp. 97–106.
(12) Maeda, T. N., and Shimizu, S. Discovery of causal additive models in the presence of unobserved variables. arXiv preprint arXiv: 2106.02234 (2021).
(13) Malinsky, D., and Spirtes, P. Causal structure learning from multivariate time series in settings with unmeasured confounding. In Proceedings of 2018 ACM SIGKDD Workshop on Causal Disocvery (2018), T. D. Le, K. Zhang, E. Kıcıman, A. Hyvärinen, and L. Liu, Eds., vol. 92 of Proceedings of Machine Learning Research, PMLR, pp. 23–47.
(14) Peters, J., Janzing, D., and Schölkopf, B. Causal inference on time series using restricted structural equation models. In Advances in Neural Information Processing Systems (2013), C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26, Curran Associates, Inc.
(15) Peters, J., Mooij, J. M., Janzing, D., and Schölkopf, B. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research 15, 1 (2014), 2009–2053.
(16) Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., and Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances 5, 11 (2019).
(17) Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research 7, Oct (2006), 2003–2030.
(18) Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., and Bollen, K. DirectLiNGAM: a direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research 12, Apr (2011), 1225–1248.
(19) Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., Webster, M., Beckmann, C. F., Nichols, T. E., Ramsey, J. D., and Woolrich, M. W. Network modelling methods for FMRI. NeuroImage 54, 2 (2011), 875–891.
(20) Spirtes, P., and Glymour, C. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review 9, 1 (1991), 62–72.
(21) Spirtes, P., Meek, C., and Richardson, T. Causal discovery in the presence of latent variables and selection bias. In Computation, Causality, and Discovery, G. F. Cooper and C. N. Glymour, Eds. AAAI Press, 1999, pp. 211–252.
(22) Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems (2018), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31, Curran Associates, Inc.