Approaching statistical thinking for text analysis.
Feb 7, 2024
Summarize the data to understand its characteristics.
A single statistic that aims to represent a variable.
Mode
Most common
0.89
used most for categorical data
Mean
Average
1.98
Median
Middle
1.67
A single statistic to represent the variability of a variable.
Standard Deviation
1.38 around the mean
IQR (Interquartile Range)
1.69 75\(^{th}\) - 25\(^{th}\) percentiles
Normal distribution
Skewed distributions
Relationship between one variable and another
Categorical | Ordinal | Numeric | |
---|---|---|---|
Categorical | Contingency Table | Contingency Table/ Bar plot | Pivot Table/ Boxplot |
Ordinal | - | Contingency Table/ Bar plot | Pivot Table/ Boxplot |
Numeric | - | - | Scatterplot |
vss_df
datasetCategorical x Categorical/ Ordinal
Categorical x Numeric (measure)
Categorical x Numeric
Numeric x Numeric
Aims | Explore: gain insight, open new avenues |
Approach | Inductive, data-driven, and iterative |
Methods | Descriptive, pattern detection with machine learning (unsupervised) |
Evaluation | Associative |
Aims | Examine: support and validate |
Approach | Semi-deductive, data/theory-driven, and iterative |
Methods | Predictive modeling with machine learning (supervised) |
Evaluation | Accuracy measures, associative |
Aims | Extrapolate: generalize and explain |
Approach | Deductive, theory-driven, and non-iterative |
Methods | Inferential statistics (theory- or simulation-based) |
Evaluation | Causal inference, associative |
Presentations, articles, and reports are the primary means of communicating results.
Analysis | Quantitative Text Analysis | Wake Forest University