Skip to content

wlla/oral-history-data

 
 

Repository files navigation

Scripts for downloading oral history data

  1. Download all oral history audio, images, and metadata as json and csv files directly from oralhistory.nypl.org
python get_metadata_and_assets.py -out "path/to/output/dir/"

This script creates in the output directory:

  • neighborhoods.json and neighborhoods.csv
  • interviews.json and interviews.csv
  • individual .json files for each interview which contain more metadata and annotations
  • write images and audio to ./audio and ./images folders
  1. Download all oral history transcripts as json, plain text, and web vtt files directly from transcribe.oralhistory.nypl.org
python get_transcripts.py -out "path/to/output/dir/"

This script creates in the output directory:

  • A manifest transcripts.json file with links to each interview transcripts
  • Individual folders for each interview that contains three formats of transcripts (.json, .txt, .vtt)
  • .json files contain all the of the edits, while the .txt and .vtt contain the "best guess" transcriptions for each line

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%