Here is code to load (and if necessary, install) required packages, and to set some global options (for plotting and efficient fitting of Bayesian models).
Toggle code
# install packages from CRAN (unless installed)pckgs_needed <-c("tidyverse","brms","rstan","rstanarm","remotes","tidybayes","bridgesampling","shinystan","mgcv")pckgs_installed <-installed.packages()[,"Package"]pckgs_2_install <- pckgs_needed[!(pckgs_needed %in% pckgs_installed)]if(length(pckgs_2_install)) {install.packages(pckgs_2_install)} # install additional packages from GitHub (unless installed)if (!"aida"%in% pckgs_installed) { remotes::install_github("michael-franke/aida-package")}if (!"faintr"%in% pckgs_installed) { remotes::install_github("michael-franke/faintr")}if (!"cspplot"%in% pckgs_installed) { remotes::install_github("CogSciPrag/cspplot")}# load the required packagesx <-lapply(pckgs_needed, library, character.only =TRUE)library(aida)library(faintr)library(cspplot)# these options help Stan run fasteroptions(mc.cores = parallel::detectCores())# use the CSP-theme for plottingtheme_set(theme_csp())# global color scheme from CSPproject_colors = cspplot::list_colors() |>pull(hex)# names(project_colors) <- cspplot::list_colors() |> pull(name)# setting theme colors globallyscale_colour_discrete <-function(...) {scale_colour_manual(..., values = project_colors)}scale_fill_discrete <-function(...) {scale_fill_manual(..., values = project_colors)}
Using regression modeling to test for stochastic independence
Here are three data sets, each with triples of associated observations for variables \(X\), \(Y\), and \(Z\). Ideally, you do not look at the code, so you will have more fun with the exercise below.
Toggle code
################################################### creating three data sets with various dependencies##################################################data_1 <-tibble(X =rnorm(500),Z =map_dbl(X, function(i) rnorm(1, mean = i)),Y =map_dbl(Z, function(i) rnorm(1, mean = i)))data_2 <-tibble(Z =rnorm(500),X =map_dbl(Z, function(i) rnorm(1, mean = i)),Y =map_dbl(Z, function(i) rnorm(1, mean = i)))data_3 <-tibble(X =rnorm(500),Y =rnorm(500),Z =map2_dbl(X, Y, function(i,j) rnorm(1, mean = i+j)),)
Exercise 1: Use regression to assess stochastic independence
Use linear regression to investigate whether:
\(X\) and \(Y\) are stochastically independent
\(X\) and \(Y\) are stochastically independent given Z
in each of these data sets.
Ideally, write a single function that takes a data set as input and output information that tells you about whether \(X\) and \(Y\) are independent.
In data set 1, \(X\) and \(Y\) are not stochastically independent, but turn independent given \(Z\).
Toggle code
result_1
$simple
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (X) > 0 1.06 0.06 0.96 1.17 Inf 1 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
$multiple
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (X) > 0 -0.03 0.06 -0.13 0.08 0.53 0.35
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
In data set 2, \(X\) and \(Y\) are not stochastically independent, and turn independent given \(Z\).
Toggle code
result_2
$simple
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (X) > 0 0.53 0.04 0.46 0.59 Inf 1 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
$multiple
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (X) > 0 0 0.05 -0.07 0.08 0.99 0.5
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
In data set 3, \(X\) and \(Y\) are stochastically independent, but cease to be given \(Z\).
Toggle code
result_3
$simple
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (X) > 0 -0.05 0.05 -0.13 0.03 0.15 0.13
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
$multiple
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (X) > 0 -0.5 0.04 -0.57 -0.44 0 0
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Exercise 2: Interpret your results
Infer which (if any) of the three data sets was created with one of the following causal structures:
collider: \(X, Y \rightarrow Z\)
chain: \(X \rightarrow Z \rightarrow Y\)
fork: \(Z \rightarrow X,Y\)
Solution
Data sets 1 and 2 could be from a chain or a fork, because \(X\) and \(Y\) are not stochastically independent, but turn independent given \(Z\). We cannot say which based on these tests.
Data set 3 could be from a collider, because \(X\) and \(Y\) are stochastically independent, but cease to be given \(Z\).