
Approaching statistical thinking for text analysis.
Feb 7, 2024

Summarize the data to understand its characteristics.
A single statistic that aims to represent a variable.
Mode

Most common
0.89
used most for categorical data
Mean

Average
1.98
Median

Middle
1.67
A single statistic to represent the variability of a variable.
Standard Deviation

1.38 around the mean
IQR (Interquartile Range)

1.69 75\(^{th}\) - 25\(^{th}\) percentiles
Normal distribution



Skewed distributions



Relationship between one variable and another
| Categorical | Ordinal | Numeric | |
|---|---|---|---|
| Categorical | Contingency Table | Contingency Table/ Bar plot | Pivot Table/ Boxplot |
| Ordinal | - | Contingency Table/ Bar plot | Pivot Table/ Boxplot |
| Numeric | - | - | Scatterplot |
vss_df datasetCategorical x Categorical/ Ordinal
Categorical x Numeric (measure)
Categorical x Numeric
Numeric x Numeric
| Aims | Explore: gain insight, open new avenues |
| Approach | Inductive, data-driven, and iterative |
| Methods | Descriptive, pattern detection with machine learning (unsupervised) |
| Evaluation | Associative |
| Aims | Examine: support and validate |
| Approach | Semi-deductive, data/theory-driven, and iterative |
| Methods | Predictive modeling with machine learning (supervised) |
| Evaluation | Accuracy measures, associative |
| Aims | Extrapolate: generalize and explain |
| Approach | Deductive, theory-driven, and non-iterative |
| Methods | Inferential statistics (theory- or simulation-based) |
| Evaluation | Causal inference, associative |
Presentations, articles, and reports are the primary means of communicating results.
flowchart LR
subgraph "Motivation"
A[Literature review] --> B[Research question]
A --> C[Hypothesis]
end
subgraph "Methods"
C --> D["Data\n description"]
B --> D
D --> F["Data analysis\n description"]
end
subgraph "Analysis"
F --> G["Descriptive statistics"]
G --> H["Exploratory findings"]
G --> I["Predictive modeling"]
G --> J["Inferential estimates"]
H --> K[Results]
I --> K
J --> K
end
subgraph "Discussion"
K --> L[Interpretation]
K --> N[Limitations]
K --> M[Implications]
end

Analysis | Quantitative Text Analysis | Wake Forest University