preliminaries

method

dummy

dummy

sledgehammer

benefit of aiming & missing

dummy

All models are wrong, but some are useful.

Box (1979), Robustness in the strategy of scientific model building

dummy

dummy

Ich glaube, einen Philosophen, einen der selbst denken kann, könnte es interessieren, meine Noten zu lesen. Denn wenn ich auch nur selten ins Schwarze getroffen habe, so würde er doch erkennen, nach welchen Zielen ich unablässig geschossen habe.

Wittgenstein, On Certainty, 387

overview

overview

contribution

  • theory-driven probabilistic model for experimental data
    • theory: some Post-Gricean pragmatics
    • data: typicality of quantifier \(\textit{some}\)
      • following work of van Tiel, Geurts, Degen & Tanenhaus
  • inform debate of what to infer from experimental data
    • what is it that a task measures?
    • how is this related to established theoretical notions?

dummy

novelty

  • combine two things:
    • theory-based computational pragmatics
    • link functions from regression modeling

typicality of some

test your intuitions

"Some of the circles are black."

0balls 1balls 2balls 3balls

4balls 1balls 2balls 3balls

4balls 4balls 4balls

experimental data (preview)

truth-value judgements for "Some of the circles are black."

experiments

overview

  • replication/extension of previous work
    • van Tiel & Geurts (2014), van Tiel (2014), Degen & Tanenhaus (2015)
  • 4 experimental variants:
    • binary truth-value judgements vs. 7-point rating scale
    • include filler sentences with \(\textit{many}\) and \(\textit{most}\) or not
  • participants recruited via Amazon's Mechanical Turk
    • excluded non-native speakers of English & obvious defectors
    • each subject rated 3 sentences with some
    • pseudo-randomized order; fully randomized visual displays

dummy

expTable

truth-value judgement task

binary

rating scale task

ordinal

results

methodological puzzles

  • do binary and ordinal tasks measure the same thing?
    • one is about truth, the other about "goodness"
    • what does it even mean to measure something with a task?
  • is what either task measures influenced by presence/absence of alternatives?
    • what is the effect of additional fillers on judgements?
  • how would we answer these questions with standard statistical techniques?
    • is there a place for pragmatic theory in a statistical model?

pragmatic felicity model

pragmatic language use

When would a cooperative speaker say: "Some of the 10 circles are black"?

no. of black balls probability of using "some" salient alternative
0 very, very low "none"
1 very low "one"
2 low "two"
3 meh "three"
4-6 high ???
7-9 lower "most"
10 low "all"

upshot

The pragmatic felicity of a description \(m\) for a situation \(c\) is a measure of how adequate \(m\) is for a given purpose of talk relative to alternative descriptions.

idea

  • quantitative notion of pragmatic felicity \(F\)
    • \(F\) is (function of) relative expected utility:
      • goodness of descriptions compared to salient alternative
    • data-driven approach to infer gradient salience of alternatives

set up

  • conditions \(c \in \{0, \dots, 10\}\): number of black balls
  • messages \(m \in M = \{\textit{none}, \textit{one}, \textit{two}, \textit{three}, \textit{many}, \textit{most}, \textit{all}, \textit{some}\}\)
  • semantics:
##       c=0 c=1 c=2 c=3 c=4 c=5 c=6 c=7 c=8 c=9 c=10
## none    1   0   0   0   0   0   0   0   0   0    0
## one     0   1   0   0   0   0   0   0   0   0    0
## two     0   0   1   0   0   0   0   0   0   0    0
## three   0   0   0   1   0   0   0   0   0   0    0
## many    0   0   0   0   0   1   1   1   1   1    1
## most    0   0   0   0   0   0   1   1   1   1    1
## all     0   0   0   0   0   0   0   0   0   0    1
## some    0   1   1   1   1   1   1   1   1   1    1

pragmatic speakers

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(c \mid m) = \text{Uniform}(c \mid \{ c' \mid m \text{ is true in } c' \} ) \]

utility for true \(c\) and interpretation \(c'\):

\[ U(c, c' \ ; \ \pi) = \exp(- \pi \ (c - c')^2 ) \]

expected utility:

\[ \text{EU}(m, c \ ; \ \pi) = \sum_{c'} P_{LL}(c' \mid m) \ U(c, c' \ ; \ \pi) \]

"Gricean"" speakers choose maximally informative/useful messages:

\[ m \in \arg \max_{m' \in M} \text{EU}(m', c \ ; \ \pi) \]

(c.f., Benz 2006, Stalnaker 2006, Franke 2011, Frank & Goodman 2012)

pragmatic felicity

scaled expected utility given set \(X\) of entertained alternatives:

\[ \text{EU}^*(c , X \ ; \ \pi) = \frac{\text{EU}(\textit{some}, c) - \min_{m \in X} \text{EU}(m, c)}{\max_{m \in X} \text{EU}(m, c) - \min_{m \in X} \text{EU}(m, c)} \]

salience of alternatives \(m \in M \setminus \{ \textit{some} \}\):

\[ s_m \sim \text{Beta}(1,1) \]

probability of entertaining \(X \subseteq M\) (crudely assume independence!):

\[ P(X \mid \vec{s}) = \prod_{m \in X} s_m \prod_{m \in M \setminus X} \ (1-s_m) \]

expected relative felicity:

\[ \text{F}(c \ ; \ \vec{s}, \pi) = \sum_X P(X \mid \vec{s}) \ \text{EU}^*(c , X \ ; \ \pi) \]

link functions in the generalized linear model

recap: simple regression

data

head(cars) 
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

dummy

model

\[\beta_0, \beta_1 \sim \text{Norm}(0, 1000)\] \[\sigma^2_{\epsilon} \sim \text{Unif}(0, 1000)\]

\[\mu_i = \beta_0 + \beta_1 x_i\] \[y_i \sim \text{Norm}(\mu_i, \sigma^2_{\epsilon})\]

types of variables

type examples
metric speed of a car, reading time
binary coin flip, truth-value judgement
nominal gender, political party
ordinal level of education, rating scale judgement
counts number of cars passing under a bridge in 1 hour

generalized linear model

glm_scheme

common link & likelihood functions

logistic function

\[\text{logistic}(\eta, \theta, \gamma) = \frac{1}{(1 + \exp(-\gamma (\eta - \theta)))}\]

dummy

dummy

threshold \(\theta\)

gain \(\gamma\)

threshold-Phi model

threshPhi

(c.f., Kruschke (2015), Doing Bayesian Data Analysis, Chapter 23)

full model & results

full model

modelGraph

MCMC set up

  • model implemented in JAGS (Plummer 2003)
  • 10,000 samples after 10,000 burn-in steps (2 chains, every second sample used)
  • convergence checked visually and by \(\hat{R}\) (Gelman & Rubin 1992)

dummy

dummy

dummy

dummy

dummy

dummy

dummy

dummy

dummy

dummy

dummy

## 256 sets to create.

posterior predictive checks

posteriors: salience

posteriors: pragmatic felicity

conclusions

conclusions

general

  • theory-driven probabilistic modeling is possible & useful
    • enforces clarity in theory formulation
    • clarifies how theory and data relate
    • helps structure theoretical debate

dummy

specific

  • idea that truth-value and rating-scale task measure the same thing is tenable
  • measure: scaled relative expected utitlity under variably salient alternatives
  • this is influenced by presence/absence of alternatives
    • infer latent salience of alternatives from data