Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Testbed for Validating Data and Formulas in CalB/C #1334

Open
6 tasks
Tracked by #1237
shweta487 opened this issue Dec 24, 2024 · 0 comments
Open
6 tasks
Tracked by #1237

Create Testbed for Validating Data and Formulas in CalB/C #1334

shweta487 opened this issue Dec 24, 2024 · 0 comments
Assignees
Labels
epic Representing research requests - large segments of work and their dependencies

Comments

@shweta487
Copy link
Contributor

Summary

We are currently upgrading the Caltrans Benefit-Cost Analysis (BCA) tool. During this upgrade, we realized the importance of automating the testing process, as the current testing is completely manual. To address this, we will set up a testbed to validate numeric data in the CalB/C Excel file using the openpyxl Python library. The Python script will validate the result values, which are generated by many complex formulas, to ensure they are within expected ranges. Due to the complexity of these formulas and the numerous metrics involved, errors are inevitable, and the testbed will help identify and catch these errors early, improving accuracy and ensuring the automation of calculations as the tool evolves.

Research required:

  • Analyze the structure of CalB/C excel file

  • Use openpyxl to load Excel file and inspect its structure including the accessing the workbook, sheets, rows and columns.

  • Identify key numeric data and cells containing formulas using .is_formula to check if a cell contains a formula.

  • Develop an automated testbed using openpyxl to validate the data

  • Extract numeric data using .value and compare with expected ranges.

  • Apply data integrity checks through Python conditions and use .iter_rows() or .iter_columns() to iterate over large datasets.

  • Identify edge cases and ensure proper handling

  • Handle empty cells and invalid values (e.g., None, NaN, or invalid strings) using .value.

  • Capture formula errors (e.g., #DIV/0!, #VALUE!) and implement error handling logic.

Deliverables

  • Python script for validating CalB/C Excel data.
  • Documentation for setup and usage.
@shweta487 shweta487 added the epic Representing research requests - large segments of work and their dependencies label Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Representing research requests - large segments of work and their dependencies
Projects
None yet
Development

No branches or pull requests

2 participants