Causal inference course note - Week 2

This is my note for the “A Crash Course in Causality: Inferring Causal Effects from Observational Data” course by Jason A. Roy on Coursera.



Table of Contents


Week 2. Confounding and Directed Acyclic Graphs (DAGs)

Confounding

We are interested in the relationship between means of different potential outcomes, e.g. $E(Y^1 - Y^0)$. To get this from observational data, we make several assumptions, including ignorability:

\[Y^0, Y^1 \indep A ~\vert~ X\]

Suppose treatment assignment depends on the potential outcomes (e.g. “sicker” patients are more likely to be treated). In that case:

Confounders are often defined as variables that affect treatment and affect the outcome.

Examples:

Confounder control

We are interested in:

  1. Identifying a set of variables $X$ that will make the ignorability assumption hold. If we do this, then $X$ is sufficient to control for confounding.
  2. Using statistical methods to control for these variables and estimate causal effects (later in the course).

Causal graphs

Which variables to control for is not a simple question. We’d like to identify a set of variables $X$ that will achieve ignorability - i.e. a set that is sufficient to control for confounding. Causal graphs will help answer this question and formalize the key ideas.

Causal graphs

Motivation

Graphs (causal graphs or directed acyclic graphs) are considered useful for causal inference.

In this lecture we will present basic information of graphical models.

Simple graphs

$A \longrightarrow Y$ is a directed graph, which shows that $A$ affects $Y$. The direction of the arrow shows that $A$ affects $Y$.

$A ~ — ~ Y$ is an undirected graph, which shows that $A$ and $Y$ are associated with each other.

Graphical models

Terminology

\[A \longrightarrow Y\]

Directed acyclic graphs (DAGs)

A graph is a directed acyclic graph (DAG) if it has no undirected edges and no cycles.

Ultimately, we will use DAGs to help us determine the set of variables that we need to control for to achieve ignorability. Before we get there, we need to understand the relationship between DAGs and probability distributions.

Relationship between DAGs and probability distributions

DAGs encode assumptions about dependencies between nodes/variables. Specifically a DAG will tell us:

Example 1.

\[D \rightarrow A \rightarrow B ~~~~~ C\]

A DAG involving nodes $A$, $B$, $C$, and $D$ encodes assumptions about the joint distribution $P(A, B, C, D)$. This DAG implies:

Decomposition of joint distribution

We can decompose the joint distribution by sequential conditioning only on sets of parents.

  1. Start with roots (nodes with no parents).
  2. Proceed down the descendant line, always conditioning on parents.

Example.

\[D \rightarrow A \rightarrow B ~~~~~ C\]

The decomposition based on this DAG is: $P(A, B, C, D) = P(C) P(D) P(A \vert D) P(B \vert A)$

Compatibility between DAGs and Distributions

A DAG admits a factorization of probability distribution as shown above. We say that this probability function and this DAG are compatible.

Example. The following DAG and the probability function are compatible.

Note that DAGs that are compatible with a particular probability function are not necessarily unique.

Simple example: $A \rightarrow B$ and $B \rightarrow A$ both convey that $A$ and $B$ are dependent.

Paths and associations

Types of paths

When do paths induce associations?

If nodes $A$ and $B$ are on the ends of a path, they are associated (via this path) if:

In a fork, information flows to both ends of the fork. In a chain, information from one makes it ot the other.

However, in an inverted fork $A \rightarrow G \leftarrow B$, information from $A$ and $B$ collide at $G$. Here $G$ is known as a collider. In this case, $A$ and $B$ both affect $G$ but information does not flow from $G$ to either $A$ or $B$. Hence, $A$ and $B$ are independent (if this was the only path between them).

More generally, if there is a collider anywhere on a path from $A$ to $B$, then no association between $A$ and $B$ comes from that path.

Conditional independence (d-separation)

Blocking

Paths can be blocked by conditioning on nodes in the path.

Consider the path: $A \rightarrow G \rightarrow B$. If we condition on $G$ (a node in the middle of a chain), we block the path from $A$ and $B$.

Association on a fork can also be blocked. Consider the path: $A \leftarrow G \rightarrow B$. If we condition on $G$, this path from $A$ to $B$ is blocked.

The opposite situation occurs if a collider is conditioned on. Consider the path: $A \rightarrow G \leftarrow B$. Here $A$ and $B$ are not associated via this path (information collides at $G$). However, conditioning on $G$ induces an association between $A$ and $B$.

Conditioning on colliders

Suppose that

Then,

d-separation

A path is d-separated (here “d” means dependency) by a set of nodes $C$ if:

Two nodes, $A$ and $B$, are d-separated by a set of nodes $C$ if it blocks every path from $A$ to $B$. Then,

\[A \indep B ~\vert~ C\]

Recall the ignorability assumption: $Y^0, Y^1 \indep A ~\vert~ X$. Our goal is to identify a set of variables $X$ that will create this conditional independence.

Confounding revisited

Confounders

Recall that the informal definition of a confounder is: “a variable that affects both the treatment and the outcome”. This is a simple DAG where $X$ is a confounder between the relationship between treatment $A$ and outcome $Y$:

\[X \rightarrow A \rightarrow Y, ~ X \rightarrow Y\]

Controlling for confounding

We want to identify a set of variables that are sufficient to control for confounding. To do this, we need to block backdoor paths from treatment to outcome. We’ll learn what that means.

Frontdoor paths

A frontdoor path from $A$ to $Y$ is one that begins with an arrow emanating out of $A$.

Example 1. $X \rightarrow A \rightarrow Y,~ X \rightarrow Y$

Example 2. $X \rightarrow A \rightarrow Z \rightarrow Y,~ X \rightarrow Y$

If we are interested in the causal effect of $A$ on $Y$, we should not control for $Z$ in Example 2. We care about the question “if we manipulate $A$, how is $Y$ affected?” Controlling for $Z$ would be controlling for an effect of treatment.

Note. Causal mediation analysis involves understanding frontdoor paths from $A$ to $Y$. This will not be covered in this course.

Backdoor paths

Backdoor paths from treatment $A$ to outcome $Y$ are paths from $A$ to $Y$ that travel through arrows going into $A$.

Example. $X \rightarrow A \rightarrow Y,~ X \rightarrow Y$

Backdoor paths confound the relationship between $A$ and $Y$. These need to be blocked.

To sufficiently control for confounding, we must identify a set of variables that block all backdoor paths from treatment to outcome. If $X$ is this set of variables, then we have $Y^0, Y^1 \indep A ~\vert~ X$, i.e. ignorability of treatment mechanism given $X$.

In the next lectures, we will discuss two criteria for identifying sets of variables that are sufficient to control for confounding:

Backdoor path criterion

Sufficient sets of confounders

A set of variables $X$ is sufficient to control for confounding if:

This is the backdoor path criterion.

Example 1. $V \rightarrow A \rightarrow Y,~ V \rightarrow W \rightarrow Y$

Example 2. $V \rightarrow A \rightarrow Y,~ V \rightarrow M \leftarrow W \rightarrow Y$.

For a real example of a causal DAG, see Figure 1 in Vahratian et al. 2005 for example (treatment = “maternal pre-pregnancy overweight and obesity”, outcome = “cesarean delivery”).

Disjunctive cause criterion

Variable selection

One method for choosing variables to control for is the disjunctive cause criterion (VanderWeele & Shpitser 2011) The criterion is simple:

Investigators do not need to know the whole graph, but rather, the list of variables that affect exposure or outcome.

If there is a set of observed variables that satisfy the backdoor path criterion, then the variables selected based on the disjunctive cause criterion will be sufficient to control for confounding.

Examples

Assume the following:

Let’s compare two methods for selecting variables in various hypothetical DAGs:

  1. Use all pre-treatment covariates: $\{M, W, V\}$ in this example.
  2. Use variables based on disjunctive cause criterion: $\{W, V\}$ in this example.

Hypothetical DAG 1. $V \rightarrow A \rightarrow Y,~ M \leftarrow V \rightarrow W \rightarrow Y$

  1. Use all pretreatment covariates $\{M, W, V\}$.
    • Satisfies backdoor path criterion? YES.
  2. Use variables based on disjunctive cause criterion: $\{W, V\}$.
    • Satisfies backdoor path criterion? YES.

Hypothetical DAG 2. $V \rightarrow A \rightarrow Y,~ V \rightarrow M \leftarrow W \rightarrow Y$

  1. Use all pretreatment covariates $\{M, W, V\}$.
    • Satisfies backdoor path criterion? YES.
  2. Use variables based on disjunctive cause criterion: $\{W, V\}$.
    • Satisfies backdoor path criterion? YES.

Hypothetical DAG 3. $W \rightarrow A \rightarrow Y \leftarrow V,~ A \dashleftarrow U_1 \dashrightarrow M \dashleftarrow U_2 \dashrightarrow Y$ (Dashed arrows mean unobserved.)

  1. Use all (observed) pretreatment covariates $\{M, W, V\}$.
    • Satisfies backdoor path criterion? NO.
  2. Use variables based on disjunctive cause criterion: $\{W, V\}$.
    • Satisfies backdoor path criterion? YES.

Hypothetical DAG 4. $M,~ W \dashleftarrow U_1 \dashrightarrow A \rightarrow Y \leftarrow V,~ W \dashleftarrow U_2 \dashrightarrow Y,~ W \rightarrow Y$ (Dashed arrows mean unobserved.)

  1. Use all (observed) pretreatment covariates $\{M, W, V\}$.
    • Satisfies backdoor path criterion? NO.
  2. Use variables based on disjunctive cause criterion: $\{W, V\}$.
    • Satisfies backdoor path criterion? NO.

Summary

The disjunctive cause criterion:

Once we know which variables to control for, the question now is how to control for them. General approaches include: matching and inverse probability of treatment weighting. We will discuss these later in the course.

Comments