Add Performance and Accuracy Testing Script for FertiScan Pipeline #32

Endlessflow · 2024-09-12T07:06:40Z

This PR introduces a basic performance and accuracy testing framework for the FertiScan pipeline as outlined in Issue #18. The key features include the ability to measure the end-to-end execution time of the pipeline, evaluate accuracy using Levenshtein similarity, and generate structured reports in CSV format.

My apologies @k-allagbe for the long PR. I will try to summarize the most important information bellow.

Key Changes:

TestCase Class:
- A class responsible for individual performance and accuracy tests.
- Measures the time taken by the pipeline and compares the actual output against expected output using Levenshtein similarity.
- Saves actual JSON output for debugging or future comparison. (I actually am hesitant on if this is really useful - looking for feedback)
TestRunner Class:
- A class responsible to executes a suite of test cases in a single run.
- Generates a structured CSV report after running the test suite with the following fields: Test Case, Field Name, Accuracy Score, Expected Value, Actual Value, Pass/Fail, and Pipeline Speed (seconds).
Accuracy Calculation:
- Implemented basic accuracy assessment using Levenshtein similarity between the expected and actual output fields.
- Configurable global accuracy threshold (set to 80%).
Performance Reporting:
- Measures the total execution time for the pipeline from end-to-end.
Output Handling: ( I used it for debugging mostly idk if it's pertinent to keep - looking for feedback)
- Actual output is saved in test_outputs, allowing for a side-by-side comparison with expected output JSON.

How to Test:

Prepare test data under the test_data/labels folder with images and expected_output.json files. (follow structure in the README found in the test_data/labels folder).
Run the script using python performance_test.
After execution, check the reports folder for a CSV report that details test case performance and accuracy.

Example CSV Report:

Test Case ,Field Name                      ,Accuracy Score ,Expected Value                                        ,Actual Value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ,Pass/Fail ,Pipeline Speed (seconds)
        1 ,company_name                    ,         12.50 ,Stoller Enterprises Inc.                              ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ,Fail      ,                  7.1083
        1 ,company_address                 ,          3.92 ,"9090 Katy Freeway, suite 400, Houston, TX 77024 USA" ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ,Fail      ,                  7.1083
        1 ,company_website                 ,        100.00 ,                                                      ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ,Pass      ,                  7.1083
        1 ,company_phone_number            ,          0.00 ,1-800-539-5283 ou 713-461-1493                        ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ,Fail      ,                  7.1083
        1 ,manufacturer_name               ,         96.00 ,Stoller Enterprises Inc.                              ,"Stoller Enterprises, Inc."                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ,Pass      ,                  7.1083
        1 ,manufacturer_address            ,         96.08 ,"9090 Katy Freeway, suite 400, Houston, TX 77024 USA" ,"9090 Katy Freeway, Suite 400 Houston, TX 77024 USA"                                                                                                                                                                                                                                                                                                                                                                                                                                           ,Pass      ,                  7.1083
        1 ,manufacturer_website            ,        100.00 ,                                                      ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ,Pass      ,                  7.1083
        1 ,manufacturer_phone_number       ,         96.67 ,1-800-539-5283 ou 713-461-1493                        ,1-800-539-5283 or 713-461-1493                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ,Pass      ,                  7.1083
        1 ,fertiliser_name                 ,        100.00 ,Balancer                                              ,Balancer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ,Pass      ,                  7.1083
        1 ,registration_number             ,        100.00 ,2012063B                                              ,2012063B                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ,Pass      ,                  7.1083
        1 ,lot_number                      ,        100.00 ,                                                      ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ,Pass      ,                  7.1083
...

Suggested Next Steps

Automated CI Integration: The framework can be integrated into a CI/CD pipeline to run tests automatically after new updates.
Semantic Similarity: The framework should use a test to check the semantic similarity of fields.
Data aggregation and visualisation: The framework should eventually be expended to include a way to aggregate data of the results across dozen of test cases and visualise the results.

closes #18

This commit adds a basic devcontainer configuration and implements a simple naive framework for performance testing.

…nd save the result in a csv file

…amework

…work

* feat: add a new input field for the form signature for the json_schema * feat: Update max_token value for gpt-4o model The `max_token` value for the `gpt-4o` model in the `gpt.py` file was changed to `None`. This change allows for unlimited token length when making API calls with the `gpt-4o` model. * feat: Update company information in expected.json The company information in the `expected.json` file was updated to reflect the new details of GreenGrow Inc. This change includes the company name, address, website, and phone number. * fix: remove newline at end of file in test_inspection.py * chore: Update test_gpt.py with translated warranty information and nutrient values * feat: Add translated nutrient values for ingredients in expected.json The code changes include adding nutrient values for ingredients in the expected.json file. This enhancement improves the accuracy and completeness of the data. The commit message follows the established convention of using a "feat" prefix to indicate a new feature or enhancement. * feat: Update nutrient values in expected.json The code changes involve updating the nutrient values in the expected.json file. This improves the accuracy and completeness of the data. The commit message follows the established convention of using a "feat" prefix for new features or enhancements. * feat: Update nutrient values in expected.json * refactor: Refactor field validation in inspection.py Refactor the field validation in the `GuaranteedAnalysis` and `FertilizerInspection` classes in `inspection.py`. The `replace_none_with_empty_list` methods have been updated to use the `field_validator` decorator instead of the `model_validator` decorator. This change improves the readability and maintainability of the code.

This commit adds a basic devcontainer configuration and implements a simple naive framework for performance testing.

…one on unit tests

…ator and I have no idea why...

performance_assessment.py

pipeline/inspection.py

…classes Simplified the script by replacing classes with functions to reduce complexity and improve readability. The script now: 1. Loads environment variables. 2. Loads test cases (images and expected outputs) from the `test_data` folder. 3. Iterates through the test cases to run the pipeline and assess performance. 4. Compiles the results into a CSV file. Consolidated trivial functions into larger ones with single responsibilities to make the code more maintainable. Updated type hints to use the latest built-in types.

… case handling for missing fields in `calculate_accuracy()`

performance_assessment.py

…work

model_validator import got lost during the merge conflict resolution

Endlessflow and others added 6 commits August 27, 2024 04:18

feat: setup devcontainer + implement basic performance testing

20a8d14

This commit adds a basic devcontainer configuration and implements a simple naive framework for performance testing.

feat: add a new input field for the form signature for the json_schema

4745032

fix: clean up expected.json (#17)

67d8183

feat: setup devcontainer + implement basic performance testing

f8edef8

This commit adds a basic devcontainer configuration and implements a simple naive framework for performance testing.

feat: adding the feature to mesure accuracy for one field at a time a…

869497e

…nd save the result in a csv file

Merge branch '18-copy' into 18-implement-basic-performance-testing-fr…

186bc4e

…amework

Endlessflow requested a review from k-allagbe September 12, 2024 07:06

Endlessflow linked an issue Sep 12, 2024 that may be closed by this pull request

Implement Basic Performance Testing Framework #18

Closed

6 tasks

Endlessflow and others added 5 commits September 12, 2024 03:14

Merge branch 'main' into 18-implement-basic-performance-testing-frame…

ca312cd

…work

feat: setup devcontainer + implement basic performance testing

44c2f6c

This commit adds a basic devcontainer configuration and implements a simple naive framework for performance testing.

bugfix: fixed most bugs in performance_assessment.py + partial work d…

981d23e

…one on unit tests

Thinks start breaking if I use model_validator instead of field_valid…

3b90655

…ator and I have no idea why...

k-allagbe reviewed Oct 6, 2024

View reviewed changes

performance_assessment.py Outdated Show resolved Hide resolved

pipeline/inspection.py Outdated Show resolved Hide resolved

Endlessflow added 5 commits October 7, 2024 15:13

feat: Add unit tests for performance_assessment.py and improve edge…

19b15b8

… case handling for missing fields in `calculate_accuracy()`

fix: lint and markdown lint

7a1c837

fix: updated the gitignore file + fixed the conflicts with main

5078634

fix: adding EOL

37b7764

k-allagbe reviewed Oct 10, 2024

View reviewed changes

performance_assessment.py Outdated Show resolved Hide resolved

performance_assessment.py Show resolved Hide resolved

Endlessflow added 4 commits October 16, 2024 15:36

feat: adding simple progress logging.

cd29f93

Merge branch 'main' into 18-implement-basic-performance-testing-frame…

9428a27

…work

fix: fixed the model_validator missing import

4179c58

model_validator import got lost during the merge conflict resolution

fix: removed f-string in places they are not necessary

b8a4581

k-allagbe approved these changes Oct 16, 2024

View reviewed changes

Endlessflow merged commit f4fd942 into main Oct 16, 2024
3 checks passed

Endlessflow deleted the 18-implement-basic-performance-testing-framework branch October 16, 2024 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Performance and Accuracy Testing Script for FertiScan Pipeline #32

Add Performance and Accuracy Testing Script for FertiScan Pipeline #32

Endlessflow commented Sep 12, 2024 •

edited

Loading

Add Performance and Accuracy Testing Script for FertiScan Pipeline #32

Add Performance and Accuracy Testing Script for FertiScan Pipeline #32

Conversation

Endlessflow commented Sep 12, 2024 • edited Loading

Key Changes:

How to Test:

Example CSV Report:

Suggested Next Steps

Endlessflow commented Sep 12, 2024 •

edited

Loading