Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting Data with Tests #25

Merged
merged 7 commits into from
Jan 3, 2025

Conversation

siddharth7113
Copy link
Contributor

Pull Request

Description

This pull request addresses the functionality required for:

  • Formatting NESO solar forecast data into OCF-compatible Forecast objects.

It tackles the following issues:

Key changes include:

  • Added format_to_forecast_sql to transform fetched forecast data into ForecastSQL objects.
  • Applied code formatting (Black)

How Has This Been Tested?

Tests were added for both:

  1. Formatting function (format_to_forecast_sql).

The tests verify:

  • The correctness of data transformation and structure.

To reproduce:

  1. Run pytest tests/test_format_forecast.py to validate the formatting logic.

Checklist:

Added necessary dependencies (excluding `pandas`) to support project functionality, testing, and CI workflows.
- Added `format_to_forecast_sql` in `format_forecast.py` to convert NESO solar forecast data into a `ForecastSQL` object.
  - Fetches model metadata, location, and input data last updated timestamp.
  - Processes rows to create `ForecastValue` objects and aggregates them into a single `ForecastSQL` object.
  - Logs key steps for improved traceability and debugging.

- Added `test_format_to_forecast_sql_real` in `test_format_forecast.py` to validate the functionality of `format_to_forecast_sql`.
  - Fetches real data using `fetch_data` and ensures it is correctly formatted into a `ForecastSQL` object.
  - Performs validation to ensure the number of `ForecastValue` entries matches the filtered DataFrame.

- Added `conftest.py` to provide shared test fixtures for the test suite:
  - `postgres_container` fixture spins up a PostgreSQL container for isolated testing using `testcontainers`.
  - `db_session` fixture sets up a clean database schema and a test ML model for each test function.
  - `test_config` fixture provides shared configuration constants for tests.

These changes establish a robust framework for processing and testing solar forecast data effectively.
…tion

- Replaced hardcoded variables (`resource_id`, `limit`, `columns`, `rename_columns`, `sql_query`) with `test_config` fixture for better reusability and maintainability.
- Updated all test cases (`test_fetch_data_api`, `test_fetch_data_sql`, `test_data_consistency`) to use `test_config` for dynamic parameterization.
- Ensured SQL queries and API calls use values from `test_config`.
- Improved code readability and consistency across test cases.
Ran Black on the codebase to standardize formatting for this PR. No functional changes were made.
@siddharth7113
Copy link
Contributor Author

Hi @peterdudfield ,

I’ve made the changes and added the formatting function, but I’m encountering some strange behavior. Initially, I created this PR feeling a bit stuck because the formatting test wasn’t passing on my local setup. Specifically, it seems to create 5 objects with 1 value each, instead of 1 object with 5 values.

However, for some reason, these tests are passing in CI, and I’m not sure why. I’m concerned this discrepancy could lead to issues in the future, so I wanted to give you a heads-up and ask for your guidance. I’ve been struggling with this for a few days, and any insights you have would be greatly appreciated!

@peterdudfield
Copy link
Contributor

Thanks @siddharth7113 for all this. Ive had a quick loo at the PR and it looks good.

What do you mean creates 5 objects instead of 1? and the other way round?

@siddharth7113
Copy link
Contributor Author

Thanks @siddharth7113 for all this. Ive had a quick loo at the PR and it looks good.

What do you mean creates 5 objects instead of 1? and the other way round?

OK as in the issue you mentioned to create 1 ForecastSQL object which contains all the Forecast values and accordingly I wrote the test and code for it, but when I was running it on my local system for every df row it was getting from API , it was creating an object in DB, and it contained 1 ForecastValue for some reason I was always getting the Assertion Error which I wrote in the test to check, but when I opened the PR, the test was passing in the CI.

pyproject.toml Outdated Show resolved Hide resolved
@peterdudfield
Copy link
Contributor

Thanks @siddharth7113 for all this. Ive had a quick loo at the PR and it looks good.
What do you mean creates 5 objects instead of 1? and the other way round?

OK as in the issue you mentioned to create 1 ForecastSQL object which contains all the Forecast values and accordingly I wrote the test and code for it, but when I was running it on my local system for every df row it was getting from API , it was creating an object in DB, and it contained 1 ForecastValue for some reason I was always getting the Assertion Error which I wrote in the test to check, but when I opened the PR, the test was passing in the CI.

hmm, do you want to make sure you have no local changes, and CI and local tests should be the same.
Apart form that, the code looks ready to merge for me

@siddharth7113
Copy link
Contributor Author

Thanks @siddharth7113 for all this. Ive had a quick loo at the PR and it looks good.
What do you mean creates 5 objects instead of 1? and the other way round?

OK as in the issue you mentioned to create 1 ForecastSQL object which contains all the Forecast values and accordingly I wrote the test and code for it, but when I was running it on my local system for every df row it was getting from API , it was creating an object in DB, and it contained 1 ForecastValue for some reason I was always getting the Assertion Error which I wrote in the test to check, but when I opened the PR, the test was passing in the CI.

hmm, do you want to make sure you have no local changes, and CI and local tests should be the same. Apart form that, the code looks ready to merge for me

Yeah, can you give me few hours, to be completely sure that I am not missing anything, most probably it would some configuration issue, but I want to be 100% sure before we merge this PR..

@peterdudfield
Copy link
Contributor

sure no problem, and no rush

- Updated `pyproject.toml` to ensure `nowcasting_datamodel` is restricted to version 1.5.56.
@siddharth7113
Copy link
Contributor Author

sure no problem, and no rush

I performed a clean build locally, and the tests are now passing, which resolves the previous issue. However, the current test failures seem to be caused by an API-side error. I wanted to propose modifying the tests to focus on unit testing by mocking the API data, as this approach would help avoid similar issues in the future. This seems like a good opportunity to make that change.

For now, we could proceed with merging this and open a new issue to address transitioning the tests to use mocked data.

@peterdudfield
Copy link
Contributor

sure no problem, and no rush

I performed a clean build locally, and the tests are now passing, which resolves the previous issue. However, the current test failures seem to be caused by an API-side error. I wanted to propose modifying the tests to focus on unit testing by mocking the API data, as this approach would help avoid similar issues in the future. This seems like a good opportunity to make that change.

For now, we could proceed with merging this and open a new issue to address transitioning the tests to use mocked data.

Thats a good idea, you ok to make an issue for that?

@peterdudfield peterdudfield merged commit 49412c7 into openclimatefix:main Jan 3, 2025
2 of 4 checks passed
@siddharth7113
Copy link
Contributor Author

sure no problem, and no rush

I performed a clean build locally, and the tests are now passing, which resolves the previous issue. However, the current test failures seem to be caused by an API-side error. I wanted to propose modifying the tests to focus on unit testing by mocking the API data, as this approach would help avoid similar issues in the future. This seems like a good opportunity to make that change.
For now, we could proceed with merging this and open a new issue to address transitioning the tests to use mocked data.

Thats a good idea, you ok to make an issue for that?

Sure, will make that and start working on it.

@peterdudfield
Copy link
Contributor

@all-contributors please add @siddharth7113 for code

Copy link
Contributor

@peterdudfield

I've put up a pull request to add @siddharth7113! 🎉

@siddharth7113 siddharth7113 deleted the formatting branch January 4, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

format into Forecast object
2 participants