AuditorLLM

labels: experimental, llm, publication, public_good

Challenge task for an agentic system: detect the presence of fraud in the Enron dataset.

Almost certainly would require test-time training or some kind of fine-tuning
Strong possibility this dataset is included in common pre-training corpuses already