The Simon task is pretty cool. The task is designed to see if responses are faster and/or more accurate when the stimulus to respond to occurs in the same relative location as the response, even if the stimulus location is irrelevant to the task. For example, it is faster to respond to a stimulus presented on the left of the screen with a key that is on the left of the keyboard (e.g. q
), than with a key that is on the right of the keyboard (e.g. p
).
A total of 213 participants took part in an online version of a Simon task. Participants were students enrolled in either “Introduction to Cognitive (Neuro-)Psychology” (N = 166), or “Experimental Psychology Lab Practice” (N = 39) or both (N = 4).
Each trial started by showing a fixation cross for 200 ms in the center of the screen. Then, one of two geometrical shapes was shown for 500 ms. The target shape was either a blue square or a blue circle. The target shape appeared either on the left or right of the screen. Each trial determined uniformly at random which shape (square or circle) to show as target and where on the screen to display it (left or right). Participants where instructed to press keys q
(left of keyboard) or p
(right of keyboard) to identify the kind of shape on the screen. The shape-key allocation happened experiment initially, uniformly at random once for each participant and remained constant throughout the experiment. For example, a participant may have been asked to press q
for square and p
for circle.
Trials were categorized as either ‘congruent’ or ‘incongruent’. They were congruent if the location of the stimulus was the same relative location as the response key (e.g. square on the right of the screen, and p
key to be pressed for square) and incongruent if the stimulus was not in the same relative location as the response key (e.g. square on the right and q
key to be pressed for square).
In each trial, if no key was pressed within 3 seconds after the appearance of the target shape, a message to please respond faster was displayed on screen.
Participants were first welcomed and made familiar with the experiment. They were told to optimize both speed and accuracy. They then practiced the task for 20 trials before starting the main task, which consisted of 100 trials. Finally, the experiment ended with a post-test survey in which participants were asked for their student IDs and the class they were enrolled in. They were also able to leave any optional comments.
We load the data into R and show a summary of the variables stored in the tibble:
d <- read_csv("03_Simon_data_anonym.csv")
glimpse(d)
## Observations: 25,560
## Variables: 15
## $ submission_id <dbl> 7432, 7432, 7432, 7432, 7432, 7432, 7432, 7432, …
## $ RT <dbl> 1239, 938, 744, 528, 706, 547, 591, 652, 627, 48…
## $ condition <chr> "incongruent", "incongruent", "incongruent", "in…
## $ correctness <chr> "correct", "correct", "correct", "correct", "cor…
## $ class <chr> "Intro Cogn. Neuro-Psychology", "Intro Cogn. Neu…
## $ experiment_id <dbl> 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, …
## $ key_pressed <chr> "q", "q", "q", "q", "p", "p", "q", "p", "q", "q"…
## $ p <chr> "circle", "circle", "circle", "circle", "circle"…
## $ pause <dbl> 1896, 1289, 1705, 2115, 2446, 2289, 2057, 2513, …
## $ q <chr> "square", "square", "square", "square", "square"…
## $ target_object <chr> "square", "square", "square", "square", "circle"…
## $ target_position <chr> "right", "right", "right", "right", "left", "rig…
## $ timeSpent <dbl> 7.565417, 7.565417, 7.565417, 7.565417, 7.565417…
## $ trial_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
## $ trial_type <chr> "practice", "practice", "practice", "practice", …
It is often useful to check general properties, such as the mean time participants spent on the experiment:
About 21.62 minutes is quite long, but we know that the mean is very susceptible to outliers, so we may want to look at a more informative set of summary statistics:
d %>% pull(timeSpent) %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.648 6.905 7.692 21.617 9.113 1158.110
HW1: Make a histogram of the timeSpent
.
We look at outlier-y behavior at the level of individual participants first, then at the level of individual trials.
It is conceivable that some participants did not take the task seriously. They may have just fooled around. We will therefore inspect each individual’s response patterns and reaction times. If participants appear to have “misbehaved” we discard all of their data. (CAVEAT: Notice the researcher degrees of freedom in the decision of what counts as “misbehavior”! It is therefore that choices like these are best committed to in advance, e.g. via pre-registration!)
We can calculate the mean reaction times and the error rates for each participant.
d_individual_summary <- d %>%
filter(trial_type == "main") %>% # look at only data from main trials
group_by(submission_id) %>% # calculate the following for each individual
summarize(mean_RT = mean(RT),
error_rate = 1 - mean(ifelse(correctness == "correct", 1, 0)))
head(d_individual_summary)
## # A tibble: 6 x 3
## submission_id mean_RT error_rate
## <dbl> <dbl> <dbl>
## 1 7432 595. 0.05
## 2 7433 458. 0.04
## 3 7434 531. 0.04
## 4 7435 433. 0.12
## 5 7436 748. 0.06
## 6 7437 522. 0.12
Let’s plot this summary information:
Here’s a crude way of branding outlier-participants:
d_individual_summary <- d_individual_summary %>%
mutate(outlier = case_when(mean_RT < 350 ~ TRUE,
mean_RT > 750 ~ TRUE,
error_rate > 0.5 ~ TRUE,
TRUE ~ FALSE))
d_individual_summary %>%
ggplot(aes(x = mean_RT, y = error_rate)) +
geom_point() +
geom_point(data = filter(d_individual_summary, outlier == TRUE),
color = "firebrick", shape = "square", size = 5)
We then clean the data set in a first step by removing all participants identified as outlier-y:
d <- full_join(d, d_individual_summary, by = "submission_id") # merge the tibbles
d <- filter(d, outlier == FALSE)
message("We excluded ", sum(d_individual_summary$outlier) , " participants for suspicious mean RTs and higher error rates.")
## We excluded 5 participants for suspicious mean RTs and higher error rates.
It is also conceivable that inidividual trials resulted in early accidental key presses or were interrupted in some way or another. We therefore look at the overall distribution of RTs and determine (similarly arbitrarily, but once again this should be planned in advance) what to exclude.
d %>% ggplot(aes(x = log(RT))) +
geom_histogram() +
geom_jitter(aes(x = log(RT), y = 1), alpha = 0.3, height = 300)
Let’s decide to exclude all trials that lasted longer than 1 second and also all trials with reaction times under 100 ms.
d <- filter(d, RT > 100 & RT < 1000)
d %>% ggplot(aes(x = RT)) +
geom_histogram() +
geom_jitter(aes(x = RT, y = 1), alpha = 0.3, height = 300)
We are mostly interested in the influence of congruency on the reactions times in the trials where participants gave a correct answer. But here we also look at, for comparison, the reaction times for the incongruent trials.
Here is a summary of the means and standard deviations for each condition:
d_sum <- d %>%
group_by(correctness, condition) %>%
summarize(mean_RT = mean(RT),
sd_RT = sd(RT))
d_sum
## # A tibble: 4 x 4
## # Groups: correctness [?]
## correctness condition mean_RT sd_RT
## <chr> <chr> <dbl> <dbl>
## 1 correct congruent 459. 105.
## 2 correct incongruent 484. 91.9
## 3 incorrect congruent 460. 111.
## 4 incorrect incongruent 404. 95.4
Here’s a plot of the reaction times split up by whether the answer was correct and whether the trial was congruent or incongruent.
d %>% ggplot(aes(x = RT)) +
geom_jitter(aes(y = 0.0005), alpha = 0.1, height = 0.0005) +
geom_density(fill = "gray", alpha = 0.5) +
geom_vline(data = d_sum,
mapping = aes(xintercept = mean_RT),
color = "firebrick") +
facet_grid(condition ~ correctness)
The analysis above is very preliminary. We could do many more things (which we should pre-register or be explicit that they are exploratory). For example, the responses made by students enrolled in the different classes could differ. In the dataframe, there is a column class
, which we could use to split the data into two groups to compare them (we already did this once when we described the sample).
For example:
d %>% filter(class == "Experimental Psych Lab") # filters the students only enrolled in Experimental Psych Lab
## # A tibble: 4,391 x 18
## submission_id RT condition correctness class experiment_id
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 7439 991 incongru… correct Expe… 52
## 2 7439 341 incongru… correct Expe… 52
## 3 7439 393 congruent correct Expe… 52
## 4 7439 344 incongru… incorrect Expe… 52
## 5 7439 709 incongru… correct Expe… 52
## 6 7439 591 incongru… correct Expe… 52
## 7 7439 516 congruent correct Expe… 52
## 8 7439 518 incongru… correct Expe… 52
## 9 7439 499 congruent correct Expe… 52
## 10 7439 375 congruent correct Expe… 52
## # … with 4,381 more rows, and 12 more variables: key_pressed <chr>,
## # p <chr>, pause <dbl>, q <chr>, target_object <chr>,
## # target_position <chr>, timeSpent <dbl>, trial_number <dbl>,
## # trial_type <chr>, mean_RT <dbl>, error_rate <dbl>, outlier <lgl>
d %>% filter(class == "Intro Cogn. Neuro-Psychology") # filters the students only enrolled in Experimental Psych Lab
## # A tibble: 19,314 x 18
## submission_id RT condition correctness class experiment_id
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 7432 938 incongru… correct Intr… 52
## 2 7432 744 incongru… correct Intr… 52
## 3 7432 528 incongru… correct Intr… 52
## 4 7432 706 incongru… correct Intr… 52
## 5 7432 547 congruent correct Intr… 52
## 6 7432 591 incongru… correct Intr… 52
## 7 7432 652 incongru… correct Intr… 52
## 8 7432 627 incongru… correct Intr… 52
## 9 7432 485 incongru… incorrect Intr… 52
## 10 7432 515 incongru… correct Intr… 52
## # … with 19,304 more rows, and 12 more variables: key_pressed <chr>,
## # p <chr>, pause <dbl>, q <chr>, target_object <chr>,
## # target_position <chr>, timeSpent <dbl>, trial_number <dbl>,
## # trial_type <chr>, mean_RT <dbl>, error_rate <dbl>, outlier <lgl>
d %>% filter(class == "both") # filters the students enrolled in both classes
## # A tibble: 477 x 18
## submission_id RT condition correctness class experiment_id
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 7462 641 incongru… correct both 52
## 2 7462 460 congruent correct both 52
## 3 7462 518 incongru… correct both 52
## 4 7462 586 congruent correct both 52
## 5 7462 612 incongru… correct both 52
## 6 7462 445 congruent correct both 52
## 7 7462 445 congruent correct both 52
## 8 7462 452 congruent correct both 52
## 9 7462 355 congruent correct both 52
## 10 7462 478 incongru… correct both 52
## # … with 467 more rows, and 12 more variables: key_pressed <chr>, p <chr>,
## # pause <dbl>, q <chr>, target_object <chr>, target_position <chr>,
## # timeSpent <dbl>, trial_number <dbl>, trial_type <chr>, mean_RT <dbl>,
## # error_rate <dbl>, outlier <lgl>
We are interested in comparing the RTs of correct answers in the congruent and incongruent conditions. We saw a difference in mean reaction times, but we’d like to know if this difference is meaningful. One way of testing this is by running a regression model, which tries to predict RTs as a function of conguency. In the simplest case we would therefore do this:
model_simple = brm(RT ~ condition, filter(d, correctness == "correct"))
summary(model_simple)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: RT ~ condition
## Data: filter(d, correctness == "correct") (Number of observations: 23011)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept 458.90 0.92 457.12 460.70 3500 1.00
## conditionincongruent 25.55 1.29 23.00 28.03 3364 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma 98.67 0.46 97.78 99.57 4963 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
According to this analysis, there is reason to believe in a difference in RTs between congruent and incongruent groups. The coefficient estimated for the incongruent group is on average ca. 25 ms higher than that of the congruent group.
However, we can also look at the interaction between correctness and condition. As shown in the above graph, there are four different cells in a 2x2 grid.
In the below model, this is coded with ‘dummy coding’ such that the top-left cell (congruent-correct) is the intercept, and each other cell is calculated by the addition of offsets.
We may want to ask the question: are reaction times to correct-congruent responses shorter than reaction times to incorrect-incongruent responses?
To do this, we first need to extract the posterior samples from our model.
Then we need to determine the correct offsets to match the correct-congruent and incorrect-incongruent cells in the design matrix.
# correct-congruent is the reference cell
correct_congruent <- post_samples$b_Intercept
# incorrect_incongruent is the bottom-right cell
incorrect_incongruent <- post_samples$b_Intercept +
post_samples$b_conditionincongruent +
post_samples$b_correctnessincorrect +
post_samples$`b_conditionincongruent:correctnessincorrect`
Once we know these, we can calculate the probability that the comparison is in the correct direction.
Homework: What does this value mean? Answer on Stud.IP VIPS before Thursday May 9, 09:00