Causal Inference: What If (WIP)
Book's Webpage: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
- A pdf is available on the same page
Pearl's works can be a bit lacking in examples and can seem too formalized for people lacking a background in formal computer science or mathematics. In contrast, the target audience of this book are health and social scientists (and hopefully, also psychologists). These researchers might be in a position to collect and analyze experimental or observational data. They might find this book useful in gaining experience on learning to make causal claims by going beyond correlations. by analyzing data collected through experimental or observational studies.
A key message of this book is that causal inference cannot be reduced to a collection of recipes for data analysis.
We will make understanding this message from its introduction as one of the core goals of reading this book. In addition, the book ignores random errors until chapter 10, for pedagogical reasons.
Table of Contents
- A DEFINITION OF CAUSAL EFFECT
- Counterfactual Outcome
- Individual Causal Effect
- Average Causal Effect
- Consistency
- Interference between individuals
- Sharp Causal Null Hypothesis
- Risk Difference
- Risk Ratio
- Odds Ratio
- Number Needed to Treat (NNT)
- Number Needed to Harm
- Consistent Estimator
- Sampling Variability
- Nondeterministic Counterfactuals
- Random Error
- Independence
- Confounding
1. A DEFINITION OF CAUSAL EFFECT
The crux of this chapter is to establish the difference between association and causation.
I'll take a gripe with the examples:
- Jumping into the swimming pool followed by reaching the jam jar: a fear of drowning can avoid the jump
- Sking on dangerous slopes followed by winning the ski race: a fear of falling
- Eating antibiotics followed by absence at park: a large number of anti-vaxers seem to precisely use similar associations
In other words, as individuals, we are prone to mistaking associations for causation. Isn't that why we parrot Correlation does not imply Causation in the statistics courses? Even though we might understand causation and correlation being two different concepts, it is very easy to mistake correlation for causation.
It is only at a species level, that we have built defense mechanism using the sparse causal insights we gain into various phenomena. These defense mechanisms help individuals survive.
Below, we note down the various terminologies and notations developed in this chapter. This will be relied upon in future chapters.
1.1. Counterfactual Outcome
\(Y^{a=a_i}\) denotes the outcome variable Y under the intervention \(a=a_i\). It is also called a potential outcome or counterfactual outcome. There are as many of them as the possible interventions.
For a given individual, exactly one potential outcome is a factual outcome. This is what is actually observed.
Note that the counterfactual outcome \(Y^{a=a_i}\) and its associated probability \(P(Y^{a=a_i})\) are different from the conditional outcome \(Y | (a=a_i)\) and the conditional probability \(P(Y | a=a_i)\). The latter are associative and do not require an intervention.
1.2. Individual Causal Effect
A binary intervention \(a=1\) has a causal effect on the individual \(i\)'s outcome variable \(Y\) if the outcome differs in the presence vs absence of the intervention. That is, \(Y^{a=1}_i \neq Y^{a=0}_i\).
Individual effects cannot actually be identified, since the non-factual outcomes are never observed for the given individual.
1.3. Average Causal Effect
A variable \(A\) has an average causal effect on outcome \(Y\) if \(P(Y^{a=1}=1) \neq P(Y^{a=0}=1)\) in the population of interest. In words, this means that the probability of outcome changes in the presence of the intervention.
Absence of average causal effect does not imply absence of individual causal effect.
Even though Individual Causal Effects cannot be identified, Average Causal Effects can be identified from data. Therefore, the term causal effect(s), will usually refer to Average Causal Effect(s).
1.4. Consistency
DOUBT
To be discussed in chapter 3.
1.5. Interference between individuals
Interference between individuals refers to the phenomena wherein the effect of intervention on one individual also depends on other individuals. Social interaction poses a risk of such interference in the studies dealing with contagious agents or educational programs.
1.6. Sharp Causal Null Hypothesis
When the individal causal effect is absent for all the individuals in the population, the Sharp Causal Null Hypothesis is said to hold.
COMMENT (Confirm/Disprove): The individual causal effect can never be identified from data. This means the Sharp Causal Null Hypothesis is undecidable from data.
1.7. Risk Difference
Causal: The Causal Risk Difference is the average of the difference of individual causal effects, and boils down to:
Associational: In contrast, the Associational Risk Difference is given by:
1.8. Risk Ratio
Causal: With the outcome variables referring to the population measures, Causal Risk Ratio is given by:
Associational: The Associational Risk Ratio is given by:
1.9. Odds Ratio
Causal: With the outcome variables referring to the population measures, Causal Odds Ratio is given by:
Associational: The Associational Odds Ratio is given by:
1.10. Number Needed to Treat (NNT)
Number Needed to Treat (NNT) is given by the average number of individuals that need to receive treatment reduce the number of cases by one.
1.11. Number Needed to Harm
Symmetric to NNT, Number Needed to Harm is given by the average number of individuals that need to receive treatment increase the number of cases by one.
1.12. Consistent Estimator
An estimator \(\hat \theta_n\) of \(\theta\) computed using a sample size \(n\) is consistent if the following holds.
Note that this is different from 1.4 as defined above.
Additional reading: Consistent Estimator - Wikipedia.
1.13. Sampling Variability
TODO
1.14. Nondeterministic Counterfactuals
A [deterministic] 1.1 assigns a single counterfactual outcome to each individual. In contrast, a Nondeterministic Counterfactual assigns a distribution of outcomes to each individual.
DOUBT: Is this all there is to it?
1.15. Random Error
Sampling Variability and Nondeterministic Counterfactuals are (the?) two sources of Random Error. However, the book ignores the random error for pedagogical reasons until chapter 10.
1.16. Independence
When the Associational Risk Difference between two variables is zero, the two are said to be independent. This is denoted by:
\(A ⫫ Y\) or \(Y ⫫ A\)
As an exercise, consider what the associational risk difference, risk ratio, and odds ratio will be when two variables are independent.
1.17. Confounding
Intuitively, the discrepancy between the causal effect measures and associational effect measures is referred to as confounding. This will be elaborated upon in chapter 7.