We replicated experiment 1 of Treismann & Gelade (1980). The experiment is intended to test two predictions of feature-integration theory concerning visual search:
target items that are uniquely distinguished from co-present distractor items by a single feature dimension pop-out and can be found quickly without (larger) interference from a higher number of distractor items
target items that are only uniquely distinguished from co-present distractor items by a conjuction of features require attentional binding of features during a linear scan of the spatial master map, so that search times should be linearly increasing in the number of distractor items
Participants see displays of colored letters. Displays vary in size, i.e., the total number of colored letters. Participants are asked to search the display for a target which is specified by a verbal description before the visual display is shown. The target is in the display (so-called positive trials) or it is not (so-called negative trials). If they find the target, participants press a response button (either J or F) with their dominant/writing hand; if they are convinced that the target is absent from the display, they press the button (J or F) associated with their non-dominant hand. Participants are instructed to be as accurate and as fast as possible. After each trial, they see a feedback screen showing whether the last answer was correct, the reaction time for the last trial, as well as their current percentage of correct answers and the current mean reaction time over all trials so far.
There are two types of theoretically interesting conditions. On feature trials participants look for a target which distinguishes itself in a single feature dimension (color or letter type) from all of the present distractors. On conjunction trials, the target shares one feature (color or letter type) with all the distractors, but is distinguished as the only element that has both features at the same time.
Visual displays consist of colored letters. Distractors are the same for both feature trials and conjunction trials, namely brown T’s and green X’s. The size of a display is either 1, 5, 15 or 30. On positive trials, where the target is in the display, the distractors (if there are any) consists in equal proportion of brown T’s and green X’s. On negative trials, there is a random choice to fill up the display with an additional brown T or an additional green X.
In feature trials, participants are instructed to search for a blue letter or the letter S. So the single target in any positive feature trial is a random choice of: (i) a blue T, (ii) a blue X, (iii) a brown S, or (iv) a green S. (There is always only one target and participants should click the button as soon as they have found it, but participants are not told that they will be looking for a blue letter in the next trial, not a letter S; they will always be instructed to look for either a blue letter or a letter S.)
In condition trials, the target is always a green T.
On each trial letters are placed on completely random positions on a grid. (E.g.: associate the whole canvas with cells, number them according to their position, draw the required number of positions (without replacement) and place the letters there.)
Participants must first answer whether their dominant hand is left or right. If it is right, they should press button J when they have found the target, and F otherwise. If their dominant hand is left, they should respond with the reverse assignment of buttons.
After explaining the task, there will be 16 practice trials in which feedback on response accuracy and speed is given already. The 16 practice trials are a random shuffle of all 16 logical conditions (4 sizes, present vs absent, conjunction vs feature). Then there are 32 main trials, before a final post-test survey. The 32 main trials are two repetitions of random shuffles of all 16 logically possible conditions.
Practice and main trials are exactly identical. On each trial, participants see a pause screen (with just “PAUSE”) on the screen. By clicking either F, J or SPACE (or, alternatively, any button whatsoever), they proceed to a “get ready” screen with a count down, starting at 2, then after one second showing 1, then showing a fixation cross for another second in the middle of the where the display (canvas) will appear next. Reaction times are measured form the onset of the visual display. After a button response, participants go the feedback screen where they see whether their last answer was correct and what their reaction time was. After a button click, they go to the next round starting with a pause screen.
Every trial (both practice and main) is a completely random choice of positive/negative, feature/conjunction condition and a total random choice of size.
[From here on, the writing style switches to something like an internal report that also shows steps of the analyses, e.g., for your colleagues; this is not how you would write a research paper!]
First we load some necessary packages:
library('tidyverse')
library('bootstrap') # bootstrapped confidence intervals
library('lme4') # generalized linear models with random effects
Read, massage and filter the data (only main trials):
d = readr::read_csv('../data/01_FeatureIntegration.csv') %>%
mutate(condition = factor(condition, ordered = T, levels = c("feature", "conjunction"))) %>%
filter(trial_type == "main")
There was a total of 124 participants. Participants self-identified as having different educational backgrounds:
d %>% group_by(education) %>% summarize(x = n()/max(trial_number))
## # A tibble: 4 x 2
## education x
## <chr> <dbl>
## 1 BSc 93.
## 2 MSc 9.
## 3 other 10.
## 4 school 12.
Add mean correctness scores and mean RTs for each participant:
d = d %>% group_by(id) %>%
mutate(correctnessScore = mean(ifelse(correctness == 'correct', 1, 0)),
meanRTind = mean(RT))
The average mean correctness score is 0.92. The average reaction time is 1226.13.
We store averages into variables:
# overall mean correctness scores
overallCorrect = with(d, table(correctness)/nrow(d))[1]
# overall mean RT scores
overallRT = mean(d$RT)
Next, we look at all participants’ individual mean reaction times. We compute an upper bound on these based on a bootstrapped 95% quantile:
# upper bound on mean individual RTs from bootstrapped 95% confidence interval
RTs = d %>% group_by(id) %>% summarize(RT = mean(meanRTind)) %>% select(RT)
meanRT = mean(RTs$RT)
RTUpperBound = mean(bootstrap(as.vector(RTs$RT), n = 1000, theta = function(x) {quantile(x,.95)})$thetastar)
We also compute a lower bound on individual correctness scores, based on a bootstrapped 5% quantile:
# lower bound on mean individual correctness scores from bootstrapped 95% confidence interval
scores = d %>% group_by(id) %>% summarize(score = mean(correctnessScore)) %>% select(score)
meanScore = mean(scores$score)
scoreLowerBound = mean(bootstrap(as.vector(scores$score), n = 1000, theta = function(x) {quantile(x,.05)})$thetastar)
We can then plot the distribution of reaction times over individuals:
ggplot(d %>% group_by(id, education) %>% summarize(toPlot = mean(meanRTind)),
aes(x = fct_reorder(factor(id), toPlot), y = toPlot, fill = education)) +
geom_bar(stat = 'identity') + xlab("participant") + ylab("mean RT (ms)") +
scale_fill_manual(values=c("#B3BFB4", "#97A799", "#5C7660", "firebrick")) +
theme_classic() +
theme(legend.position = "bottom",
axis.line = element_line(color = "#5C7660"),
legend.key.height = unit(2,"line"),
legend.title = element_text(size = 16, face = "bold"),
legend.text = element_text(size = 16),
legend.background = element_rect(fill = "transparent"),
strip.background = element_blank(),
panel.spacing = unit(2, "lines"),
panel.border = element_blank(),
plot.background = element_rect(fill = "transparent", colour = NA),
panel.background = element_rect(fill = "transparent"),
strip.text.x = element_text(size = 18),
axis.ticks.y = element_line(colour = '#5C7660'),
axis.ticks.x = element_line(colour = 'white'),
axis.text.y = element_text(size = 9, color = "#5C7660"),
axis.text.x = element_text(size = 0, color = "#5C7660"),
axis.title = element_text(size = 18, face = "bold", color = "#515B53"),
plot.title = element_text(size = 18, face = "bold"),
plot.margin=unit(c(1,1,1.5,1.2),"cm")) +
geom_hline(aes(yintercept = overallRT)) +
geom_hline(aes(yintercept = RTUpperBound)) +
geom_text(aes(x = 25, y = overallRT + 50, label = "mean", angle = 0)) +
geom_text(aes(x = 25, y = RTUpperBound + 50, label = "95% CI", angle = 0))
And similarly, we can plot the distribution over each individual’s correctness score:
ggplot(d %>% group_by(id , education) %>% summarize(toPlot = mean(correctnessScore)),
aes(x = fct_reorder(factor(id), -toPlot), y = toPlot * 100, fill = education)) +
geom_bar(stat = 'identity') + xlab("participant") + ylab("percent correct") +
scale_fill_manual(values=c("#B3BFB4", "#97A799", "#5C7660", "firebrick")) +
theme_classic() +
theme(legend.position = "bottom",
axis.line = element_line(color = "#5C7660"),
legend.key.height = unit(2,"line"),
legend.title = element_text(size = 16, face = "bold"),
legend.text = element_text(size = 16),
legend.background = element_rect(fill = "transparent"),
strip.background = element_blank(),
panel.spacing = unit(2, "lines"),
panel.border = element_blank(),
plot.background = element_rect(fill = "transparent", colour = NA),
panel.background = element_rect(fill = "transparent"),
strip.text.y = element_text(size = 18),
axis.ticks.y = element_line(colour = '#5C7660'),
axis.ticks.x = element_line(colour = 'white'),
axis.text.y = element_text(size = 9, color = "#5C7660"),
axis.text.x = element_text(size = 0, color = "#5C7660"),
axis.title = element_text(size = 18, face = "bold", color = "#515B53"),
plot.title = element_text(size = 18, face = "bold"),
plot.margin=unit(c(1,1,1.5,1.2),"cm")) +
geom_hline(aes(yintercept = meanScore * 100)) +
geom_hline(aes(yintercept = scoreLowerBound * 100)) +
geom_text(aes(x = 110, y = meanScore * 100 + 5, label = "mean", angle = 0)) +
geom_text(aes(x = 22, y = scoreLowerBound * 100 - 5, label = "95% CI", angle = 0))
We remove all participants whose mean RTs are above the bootstrapped 95% quantile and whose mean correctness scores are below the bootstrapped 5% bar. (This is crude and arbitrary here; ideally, we specify exclusion criteria before having seen any data based on a strong theoretical motivation.)
d = d %>%
filter(meanRTind < RTUpperBound & correctnessScore > scoreLowerBound)
Let’s then have a look at some summary statistics, which we will also use for plotting:
dsummary = d %>% group_by(trial,size,condition) %>%
summarize(meanRT = mean(RT),
minCI = mean(bootstrap(RT, 1000, theta = function(x) {quantile(x,.05)})$thetastar),
maxCI = mean(bootstrap(RT, 1000, theta = function(x) {quantile(x,.95)})$thetastar)) %>%
ungroup() %>%
mutate(trial = factor(trial))
dsummary
## # A tibble: 16 x 6
## trial size condition meanRT minCI maxCI
## <fct> <int> <ord> <dbl> <dbl> <dbl>
## 1 negative 1 feature 919. 632. 1422.
## 2 negative 1 conjunction 875. 587. 1291.
## 3 negative 5 feature 1037. 640. 1547.
## 4 negative 5 conjunction 1033. 692. 1577.
## 5 negative 15 feature 1468. 767. 2272.
## 6 negative 15 conjunction 1665. 893. 2674.
## 7 negative 30 feature 1904. 834. 3051.
## 8 negative 30 conjunction 2271. 1132. 3529.
## 9 positive 1 feature 870. 549. 1500.
## 10 positive 1 conjunction 726. 505. 1140.
## 11 positive 5 feature 848. 553. 1309.
## 12 positive 5 conjunction 881. 577. 1413.
## 13 positive 15 feature 937. 579. 1525.
## 14 positive 15 conjunction 1162. 632. 1840.
## 15 positive 30 feature 997. 604. 1794.
## 16 positive 30 conjunction 1543. 675. 2763.
The predictions we would like to test are about a functional relationship between reaction times (as dependent or to-be-explained variable) and the trial
type (single feature vs. conjunction of features) as well as the size
of the display (i.e., the number of distractor items). So we would like to have a plot that displays mean RTs independently for each trial type and size configuration, like so:
ggplot(dsummary, aes(y = meanRT, x = size, color = trial)) +
geom_point() + geom_line() +
# geom_errorbar(aes(x = size, ymin = minCI, ymax = maxCI), width = 0.2) +
facet_grid(~ condition) +
scale_color_manual(values=c("darkgrey", "firebrick")) +
scale_x_continuous(breaks = c(1,5,15,30)) +
theme_classic() +
theme(legend.position = "bottom",
axis.line = element_line(color = "#5C7660"),
legend.key.height = unit(1,"line"),
legend.title = element_text(size = 0, face = "bold"),
legend.text = element_text(size = 12),
legend.background = element_rect(fill = "transparent"),
strip.background = element_blank(),
panel.spacing = unit(2, "lines"),
panel.border = element_blank(),
plot.background = element_rect(fill = "transparent", colour = NA),
panel.background = element_rect(fill = "transparent"),
strip.text.y = element_text(size = 18),
axis.ticks = element_line(colour = '#5C7660'),
axis.text = element_text(size = 9, color = "#5C7660"),
axis.title = element_text(size = 12, face = "bold", color = "#515B53"),
plot.title = element_text(size = 18, face = "bold"),
plot.margin=unit(c(1,1,1.5,1.2),"cm"))
Feature-integration theory predicts that size
should have an effect mainly/only in positive conjunction trials, but not in positive feature trials. We can test this prediction by a linear regression model, which focusses on the positive trials (i.e., where the target was present) and checks whether the independent/explanatory variables size
and condition
have a significant effect. We are also interested in the interaction between size
and conjunction
. We regress log-RTs, not RTs, to make sure that the dependent variable is approximately normally distributed.
model = glm(log(RT) ~ condition * (size -1) , data = filter(d, trial == "positive"))
summary(model)
##
## Call:
## glm(formula = log(RT) ~ condition * (size - 1), data = filter(d,
## trial == "positive"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.19557 -0.20935 -0.03425 0.18354 1.40340
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## conditionfeature 6.7026864 0.0164424 407.65 <2e-16 ***
## conditionconjunction 6.5917885 0.0164424 400.90 <2e-16 ***
## size 0.0138530 0.0006854 20.21 <2e-16 ***
## condition.L:size 0.0127474 0.0009693 13.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1063275)
##
## Null deviance: 84449.75 on 1808 degrees of freedom
## Residual deviance: 191.81 on 1804 degrees of freedom
## AIC: 1084.7
##
## Number of Fisher Scoring iterations: 2
We conclude from this analysis that size
had an influence on both feature and conjunction trials. Based on the estimates for the slope coefficients (and by looking at our previous plot) and the significance of the interaction term, we can also conclude that the impact of increasing size
is stronger in the conjunction condition.