“People generally see what they look for, and hear what they listen for.”
— Harper Lee, To Kill a Mockingbird
Apr 10, 2024
infer
Goals
When to use
How to use
Null Hypothesis Significance Testing (NHST)
Evidence: the likelihood of the observed statistic(s) if the null hypothesis is true is below some pre-specified threshold (significance level).
Theory-based NHST
Simulation-based NHST
The infer
(Couch et al. 2021) package is a simulation-based approach to NHST.
infer
A. Identify
B. Inspect
C. Interrogate
D. Interpret
Scenario | Explanatory variable(s) | Statistical test | infer |
---|---|---|---|
Univariate | - | Proportion | prop |
Bivariate | Categorical | Difference in proportions | diff in props |
Bivariate (>2 levels) | Categorical (3+ levels) | Chi-square | chisq |
Multivariate | Categorical or Numeric (2+ variables) |
Logistic regression | fit() |
Scenario | Explanatory variable(s) | Statistical test | infer |
---|---|---|---|
Univariate | - | Mean | mean |
Bivariate | Numeric | Correlation | correlation |
Bivariate | Categorical (2 levels) | Difference in means | diff in means |
Bivariate | Categorical (3+ levels ) | ANOVA | f |
Multivariate | Numeric or Categorical (2+) | Linear regression | fit() |
RQ | Difference in passives between American and British English |
Population | Written American and British English |
Hypothesis | British English uses more passives than American English |
Null hypothesis | No difference in passives between American and British English, or American English uses more passives |
Mapping | pass_rate ~ var |
Information types | Resp: num, Exp: cat (2 levels) |
Test statistic | Difference in means |
Significance level | 0.05 |
With a bivariate relationship where the explanatory variable has two levels, we can use a boxplot or density plot to visualize the distribution of the response variable.
infer
package provides a simulation-based approach to NHST that is easier to understand than classical methods.RQ | Difference in passives between genres in English |
Population | Written English |
Hypothesis | More formal genres use more passives than less formal genres |
Null hypothesis | No difference in passives between genres, or less formal genres use more passives |
Mapping | pass_rate ~ genre |
Information types | Resp: num, Exp: cat (9 levels) |
Test statistic | ANOVA (f ) |
Significance level | 0.05 |
Infer | Quantitative Text Analysis | Wake Forest University