- Figure out the requirements and
pip install
them. Sorry 😅 - Download the full dataset with attachments (Internet Archive).
python data.py --enron-root $path_to_download --media-dir $image_df_output_path --hidden
python viz.py --input $image_df_output_path --output $big_image_output_path
- [for tagging emails with OpenAI]
python tags.py -i $path_to_kaggle_text_dataframe_joblib
(this costs ~$50 as of September 2024) - Check of
results.ipynb
for some hints on how to work with the tagged text data