“People generally see what they look for, and hear what they listen for.”
— Harper Lee, To Kill a Mockingbird
Apr 10, 2024
infer


Goals
When to use
How to use
Null Hypothesis Significance Testing (NHST)
Evidence: the likelihood of the observed statistic(s) if the null hypothesis is true is below some pre-specified threshold (significance level).
Theory-based NHST
Simulation-based NHST
The infer (Couch et al. 2021) package is a simulation-based approach to NHST.
inferA. Identify
B. Inspect
C. Interrogate
D. Interpret
| Scenario | Explanatory variable(s) | Statistical test | infer |
|---|---|---|---|
| Univariate | - | Proportion | prop |
| Bivariate | Categorical | Difference in proportions | diff in props |
| Bivariate (>2 levels) | Categorical (3+ levels) | Chi-square | chisq |
| Multivariate | Categorical or Numeric (2+ variables) |
Logistic regression | fit() |
| Scenario | Explanatory variable(s) | Statistical test | infer |
|---|---|---|---|
| Univariate | - | Mean | mean |
| Bivariate | Numeric | Correlation | correlation |
| Bivariate | Categorical (2 levels) | Difference in means | diff in means |
| Bivariate | Categorical (3+ levels ) | ANOVA | f |
| Multivariate | Numeric or Categorical (2+) | Linear regression | fit() |
| RQ | Difference in passives between American and British English |
| Population | Written American and British English |
| Hypothesis | British English uses more passives than American English |
| Null hypothesis | No difference in passives between American and British English, or American English uses more passives |
| Mapping | pass_rate ~ var |
| Information types | Resp: num, Exp: cat (2 levels) |
| Test statistic | Difference in means |
| Significance level | 0.05 |
With a bivariate relationship where the explanatory variable has two levels, we can use a boxplot or density plot to visualize the distribution of the response variable.
infer package provides a simulation-based approach to NHST that is easier to understand than classical methods.| RQ | Difference in passives between genres in English |
| Population | Written English |
| Hypothesis | More formal genres use more passives than less formal genres |
| Null hypothesis | No difference in passives between genres, or less formal genres use more passives |
| Mapping | pass_rate ~ genre |
| Information types | Resp: num, Exp: cat (9 levels) |
| Test statistic | ANOVA (f) |
| Significance level | 0.05 |

Infer | Quantitative Text Analysis | Wake Forest University