Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 988 Bytes

README.md

File metadata and controls

16 lines (12 loc) · 988 Bytes

Exploratory analysis of the attachments in the Enron email corpus

all image attachment media from the enron email corpus

Usage

  1. Figure out the requirements and pip install them. Sorry 😅
  2. Download the full dataset with attachments (Internet Archive).
  3. python data.py --enron-root $path_to_download --media-dir $image_df_output_path --hidden
  4. python viz.py --input $image_df_output_path --output $big_image_output_path
  5. [for tagging emails with OpenAI] python tags.py -i $path_to_kaggle_text_dataframe_joblib (this costs ~$50 as of September 2024)
  6. Check of results.ipynb for some hints on how to work with the tagged text data

References