Skip to content
/ fixml Public

LLM Tool for effective test evaluation of ML projects with curated Checklists and LLM prompts

License

Notifications You must be signed in to change notification settings

UBC-MDS/fixml

Repository files navigation

FixML

Python 3.12.0+ GitHub Release PyPI - Version GitHub Activity Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation Status CI status check License: MIT License: CC BY 4.0

A tool for providing context-aware evaluations using a checklist-based approach on the Machine Learning project code bases.

Documentations

Installation

pip install fixml

# For unix-like systems e.g. Linux, macOS 
export OPENAI_API_KEY={your-openai-api-key}

# For windows systems
set OPENAI_API_KEY={your-openai-api-key}

For more detailed installation guide, visit the related page on ReadtheDocs.

Usage

CLI tool

FixML offers a CLI command to quick and easy way to evaluate existing tests and generate new ones.

Tip

You can also refer to our Quickstart guide on more detailed walkthrough on how to use the CLI tool, e.g. which flag is used to change the LLM model version

Test Evaluator

Here is an example command to evaluate a local repo:

Tip

Run command fixml evaluate --help for more information and all available options.

# A simple run
fixml evaluate /path/to/your/repo \
  --export_report_to=./eval_report.html --verbose

# A run that specifies the LLM model and checklist path
fixml evaluate /path/to/your/repo \
  --export_report_to=./eval_report.html \
  --verbose \
  --model=gpt-4o \
  --checklist_path=src/fixml/data/checklist/checklist.csv \
  --overwrite

Test Spec Generator

Here is an example command to evaluate a local repo

Tip

Run command fixml generate --help for more information and all available options.

# A simple run
fixml generate test.py

# A run that specifies the LLM model and checklist path
fixml generate test.py \
  --model=gpt-4o \
  --checklist_path=src/fixml/data/checklist/checklist.csv

Package

Alternatively, you can use the package to import all components necessary for running the evaluation/generation workflows listed above.

Consult our documentation on using the API for more information and example calls.

Development Build

Please refer to the related page in our documentation.

Rendering Documentations

Please refer to the related page in our documentation.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

fixml was created by John Shiu, Orix Au Yeung, Tony Shum, and Yingzi Jin as a deliverable product during our capstone project of the UBC-MDS program in collaboration with Dr. Tiffany Timbers and Dr. Simon Goring. It is licensed under the terms of the MIT license for software code. Reports and instructional materials are licensed under the terms of the CC-BY 4.0 license.

Citation

If you use fixml in your work, please cite:

@misc{mds2024fixml,
  author =       {John Shiu, Orix Au Yeung, Tony Shum, and Yingzi Jin},
  title =        {fixml: A Comprehensive Tool for Test Evaluation and Specification Generation},
  howpublished = {\url{https://https://github.com/UBC-MDS/fixml}},
  year =         {2024}
}

Acknowledgements

We'd like to thank everyone who has contributed to the development of the fixml package. This is a new project aimed at enhancing the robustness and reproducibility of applied machine learning software. It is meant to be a research tool and is currently hosting on GitHub as an open source project. We welcome it to be read, revised, and supported by data scientists, machine learning engineers, educators, practitioners, and hobbyists alike. Your contributions and feedback are invaluable in making this package a reliable resource for the community.

Special thanks to the University of British Columbia (UBC) and the University of Wisconsin-Madison for their support and resources. We extend our gratitude to Dr. Tiffany Timbers and Dr. Simon Goring for their guidance and expertise, which have been instrumental in the development of this project.