Where science, data, and linguistics meet.
Jan 24, 2024
“Everything about science is changing because of the impact of information technology and the data deluge.”
- Jim Gray
Humans are inherently limited in their ability to understand the world as it is. In what ways?
A process for understanding the world as it is.
Scientific workflow
Data science workflow
Professional
Academic
Text analysis the process of extracting information from observed language data.
It can be used as a tool for research or a method of inquiry in its own right.
We will approach text analysis as a method of inquiry.
Bychkovska and Lee (2017) investigates possible differences between L1-English and L1-Chinese undergraduate students’ use of lexical bundles, multiword sequences which are extended collocations (i.e. as the result of), in argumentative essays. The authors used the Michigan Corpus of Upper-Level Student Papers (MICUSP) corpus using the argumentative essay section for L1-English and the Corpus of Ohio Learner and Teacher English (COLTE) for the L1-Chinese English essays. They found that L1-Chinese writers used more than 2 times as many bundle types than L1-English peers which they attribute to L1-Chinese writers attempt to avoid uncommon expressions and/or due to their lack of register awareness (conversation has more bundles than writing, generally).
Questions
Olohan (2008) investigate the extent to which translated texts differ from native texts. In particular the author explores the notion of explicitation in translated texts (the tendency to make information in the source text explicit in the target translation). The study makes use of the Translational English Corpus (TEC) for translation samples and comparable sections of the British National Corpus (BNC) for the native samples. The results suggest that there is a tendency for syntactic explicitation in the translational corpus (TEC) which is assumed to be a subconscious process employed unwittingly by translators.
Questions
Brainstorm some ideas you may have in which text analysis could be used as a method of inquiry.
Questions to consider
Foundations
Establish a fundamental understanding of the characteristics of each of the levels in the “Data, Information, Knowledge, and Insight Hierarchy (DIKI)”
You will be able to read, write, and manipulate text data in R including creating statistical summary tables and plots. You will also have the foundational skills to frame research questions and design studies that use text analysis.
Preparation
Implement data acquistion, curation, and transformation steps.
You will be able to acquire, curate, and transform text data in R. You will also have the skills to design and implement data collection procedures for text analysis.
Analysis
Perform analysis of datasets, the evaluation of results, and the interpretation of the findings for exploratory, predictive, and inferential purposes.
You will be able to analyze text data in R and interpret findings in context. You will also have the skills to design, implement, and critique data analysis procedures for text analysis.
Communication
Demonstrate the presentation of research either as a prospectus of a viable research plan (prospectus) or as a implemented research project (final project).
You will be able to communicate research findings in a reproducible manner. You will also have the skills to create and share reproducible computing environments for data analysis projects.
This course is designed to provide you with the skills to use text analysis as a method of inquiry.
Text analysis in context | Quantitative Text Analysis | Wake Forest University