Collider Bias: Movie Star Example
This app illustrates the Collider Bias in the context of the Movie Star Example from Cunningham (2021, Section 3.6.1). A CNN blogpost reported that Megan Fox was voted the worst but at the same time the most attractive star in 2009. This suggests the question: Are more attractive actors generally less talented?
Are more Attractive Actors less Talented?
We revisit this question and try to disentangle the causal mechanisms using a DAG and a simulated data example. Consider the two characteristics 'talent' and 'beauty'. We simulate a data set such that there is no causal relationship between these two variables, i.e., they are stochastically independent. However, if we condition on a collider, i.e., a variable that is causally affected by 'talent' and 'beauty', we find a significant negative correlation between these two characteristics. The reason for this is the so-called Collider Bias. In the movie star example, such a collider could be a variable that indicates whether a person is a star or not ('star'): More attractive persons and also more talented persons probably have a higher chance to become a movie star. Hence, if we base our analysis on this sample selection, we might draw conclusions that do not hold for the entire population, in general.
Data Example: Scatter Plot, Causal Diagram, Regression Output
The scatter plot below shows the data points according to the specified sample selection, i.e., consider the entire population or the subsample of movie stars only. The Directed Acyclical Graph (DAG) on the right illustrates the causal relationship in the movie star example. It tells us, whether we can expect to find a significant association in an empirical example. If we condition on the collider, the variables 'talent' and 'beauty' are 'd-connected': We will probably find a correlation of these variables in a data example. If they are 'd-separated', they will probably not be correlated. This is reflected by the regression output shown below. It shows the coefficient estimate and confidence interval from a linear regression of 'beauty' on 'talent' in the simulated data set.
Options
Regression Output
Output from a linear regression of the variable 'beauty' on 'talent' based on the simulated data set.
Scatter Plot
DAG
Code
The code is available at the GitHub repository https://github.com/DigitalCausalityLab/colliderbias.
In case you find a bug or have suggestion for improvements, please open an issue in GitHub.
References
Cunningham, Scott. “Causal inference.” Causal Inference. Yale University Press, 2021, Section 3.6.1., available online.
Piazza, Jo. 2009. “Megan Fox Voted Worst - but Sexiest - Actress of 2009.” https://marquee.blogs.cnn.com/2009/12/30/megan-fox-voted-worst-but-sexiest-actress-of-2009/