Harvesting research data

Acquiring research-aligned data

Dr. Jerid Francom

Feb 28, 2024

Overview

  • Identifying data sources
  • Acquiring data
  • Documenting data

Identifying data sources

Methods

  • Empirical research
  • Part of the research gap

Data sources

  • Repositories
  • APIs

Criteria

  • Data quality
  • Data availability
  • Data access/ usage rights
  • Data format/ documentation

Acquiring data

Methods

  • Manual download
  • Programmatic download
  • API access

R support

  • download.file()
  • API interfaces (e.g., tuber, rtweet, rtoot, gutenbergr, TBDBr, jstor)
  • Control statements (if)
  • Custom functions (function(...) { ... })

Documenting data

Methods

  • Literate programming
  • Research scaffold
  • Data origin

R support

  • (Un)Archive files (unzip(), untar())
  • Data frames to disk (write_csv())
  • Template for data origin (qtalrkit::create_data_origin())

Conclusion

Summary

  • Acquiring data is a crucial part of the research process
    • Data is the foundation of empirical research
    • Data quality is essential
  • Quarto/ R provides a variety of tools to support the acquisition and documentation of data
    • Literate programming
    • Control statements
    • Custom functions

Lab 05: Harvesting research data

Overview

  • Fork, clone, and open the lab project
  • Follow the instructions in the README.md file
  • Submit your work to GitHub
  • Notify me via Canvas with a link to your repository

References