Skip to content

Unofficial Implementation of Microsoft SpreadsheetLLM Paper

Notifications You must be signed in to change notification settings

dtung8068/spreadsheet-llm-unofficial

Repository files navigation

image

SpreadsheetLLM

My unofficial implementation of Microsoft's SpreadsheetLLM paper, found here: https://arxiv.org/pdf/2407.09025.

Requirements

All requirements are listed in requirements.txt. I have attached two Dockerfiles, one for the command line utility and one for the chatbot.

You will also need to download the VFUSE dataset from TableSense, found here: https://figshare.com/projects/Versioned_Spreadsheet_Corpora/20116

Environment Variables: OPENAI_API_KEY for GPT 3.5/4, HUGGING_FACE_KEY for Llama-2/3, Phi-3, and Mistral

Directions

By default, running python main.py will generate the number of tables in 7b5a0a10-e241-4c0d-a896-11c7c9bf2040.xls. Use the command line arguments if you want to compress all files in a given directory, change the model, etc.

To run the chatbot, run streamlit run chatbot.py.

Limitations

  1. Only text was considered for the structural anchor-based extraction, formatting (border, color, etc.) was not considered
  2. NFS Identification currently relies on regular expressions and may not be robust
  3. Only .xls files will work at this time

Future Plans

  1. Running tests on the LLM
  2. Enabling compatibility with other Excel formats

About

Unofficial Implementation of Microsoft SpreadsheetLLM Paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages