Text analysis in context

Where science, data, and linguistics meet.

Dr. Jerid Francom

Jan 24, 2024

Overview

“Everything about science is changing because of the impact of information technology and the data deluge.”
- Jim Gray

  • Science
  • Data science
  • Text analysis
  • The plan

Science

Why science?

Humans are inherently limited in their ability to understand the world as it is. In what ways?

  • Our individual experiences are limited and not representative of the whole.
  • Our cognition is not free from bias. Memory and recall are not perfect.

What is science?

A process for understanding the world as it is.

  • Systematic
  • Meticulous
  • Replicable

Scientific workflow

flowchart LR
  subgraph Science
    direction TB
    B[Question] --> C>Collect data]
    C --> D>Analyze data]
    D --> E>Interpret results]
    E --> F[Report]
  end
  A((Observation)) --> Science --> G((Insight))

Data science

Emergence

  • Computing power
  • Data availability
  • Data storage

Data science workflow

flowchart LR
  subgraph two ["Data science"]
    direction TB
    C --> C1(Diverse sources) --> C2
    C2[(Data storage)] --> D
    D --> D1{Algorithms} --> E
    subgraph one [Science]
      direction TB
      B[Question] --> C>Collect data]
      C .-> D>Analyze data]
      D .-> E>Interpret results]
      E --> F[Report]
    end
  end
  A((Observation)) --> two --> G((Insight))

The data science toolbelt

  • Computing:
    execute the research process
  • Statistics:
    identifying patterns
  • Domain knowledge:
    understanding the context

Ubiquity

Professional

  • Business
  • Medicine
  • Government
  • Law
  • Journalism
  • etc.

Academic

  • Formal sciences
  • Natural sciences
  • Social sciences
  • Humanities
  • etc.

Text analysis

Language research

flowchart TB
  A[Methods] --> B[Qualitative]
  B ----> B1[Ethnolinguistics]
  B ----> B2[Discourse analysis]
  A --> C[Quantitative]
  C --> D[Experimental]
  D ----> D1[Psycholinguistics]
  D ----> D2[Phonetics]
  C --> E[Observational]
  E ----> E1[Computational linguistics]
  E ----> E2[Corpus linguistics]

Text analysis the process of extracting information from observed language data.

It can be used as a tool for research or a method of inquiry in its own right.

We will approach text analysis as a method of inquiry.

Cases

Bychkovska and Lee (2017) investigates possible differences between L1-English and L1-Chinese undergraduate students’ use of lexical bundles, multiword sequences which are extended collocations (i.e. as the result of), in argumentative essays. The authors used the Michigan Corpus of Upper-Level Student Papers (MICUSP) corpus using the argumentative essay section for L1-English and the Corpus of Ohio Learner and Teacher English (COLTE) for the L1-Chinese English essays. They found that L1-Chinese writers used more than 2 times as many bundle types than L1-English peers which they attribute to L1-Chinese writers attempt to avoid uncommon expressions and/or due to their lack of register awareness (conversation has more bundles than writing, generally).

Questions

  • What is the area of research?
  • What is the research question?
  • What is the data?
  • What is the method?
  • What is the finding?

Cases

Olohan (2008) investigate the extent to which translated texts differ from native texts. In particular the author explores the notion of explicitation in translated texts (the tendency to make information in the source text explicit in the target translation). The study makes use of the Translational English Corpus (TEC) for translation samples and comparable sections of the British National Corpus (BNC) for the native samples. The results suggest that there is a tendency for syntactic explicitation in the translational corpus (TEC) which is assumed to be a subconscious process employed unwittingly by translators.

Questions

  • What is the area of research?
  • What is the research question?
  • What is the data?
  • What is the method?
  • What is the finding?

Other cases?

Brainstorm some ideas you may have in which text analysis could be used as a method of inquiry.

Questions to consider

  • What is the area of research?
  • What is the research question?
  • What is the data?
  • What is the method?
  • What is the finding?

The plan

The plan

Foundations
Establish a fundamental understanding of the characteristics of each of the levels in the “Data, Information, Knowledge, and Insight Hierarchy (DIKI)”

You will be able to read, write, and manipulate text data in R including creating statistical summary tables and plots. You will also have the foundational skills to frame research questions and design studies that use text analysis.

The plan

Preparation
Implement data acquistion, curation, and transformation steps.

You will be able to acquire, curate, and transform text data in R. You will also have the skills to design and implement data collection procedures for text analysis.

The plan

Analysis
Perform analysis of datasets, the evaluation of results, and the interpretation of the findings for exploratory, predictive, and inferential purposes.

You will be able to analyze text data in R and interpret findings in context. You will also have the skills to design, implement, and critique data analysis procedures for text analysis.

The plan

Communication
Demonstrate the presentation of research either as a prospectus of a viable research plan (prospectus) or as a implemented research project (final project).

You will be able to communicate research findings in a reproducible manner. You will also have the skills to create and share reproducible computing environments for data analysis projects.

Final thoughts

  • Science is a process for understanding the world as it is.
  • Data science enhances science using computing, statistics, and domain knowledge.
  • Text analysis employs data science to extract and analyze information from observed language usage.

This course is designed to provide you with the skills to use text analysis as a method of inquiry.

References

Bychkovska, Tetyana, and Joseph J. Lee. 2017. “At the Same Time: Lexical Bundles in L1 and L2 University Student Argumentative Writing.” Journal of English for Academic Purposes 30 (November): 38–52. https://doi.org/10.1016/j.jeap.2017.10.008.
Olohan, Maeve. 2008. “Leave It Out! Using a Comparable Corpus to Investigate Aspects of Explicitation in Translation.” Cadernos de Tradução, 153–69.