Formatting and Saving Forecast Data with Tests #24

siddharth7113 · 2024-12-07T13:38:15Z

Pull Request

Description

This pull request addresses the functionality required for:

Formatting NESO solar forecast data into OCF-compatible Forecast objects.
Saving formatted forecasts to the database.

It tackles the following issues:

Fixes format into Forecast object #4
Fixes save #5

Key changes include:

Added format_to_forecast_sql to transform fetched forecast data into ForecastSQL objects.
Implemented save_forecasts_to_db to persist these formatted forecasts in the database.
Applied code formatting (Black) and addressed all linting issues (Ruff).

How Has This Been Tested?

Tests were added for both:

Formatting function (format_to_forecast_sql).
Saving function (save_forecasts_to_db).

The tests verify:

The correctness of data transformation and structure.
The ability to save real forecast data into the database.

To reproduce:

Run pytest tests/test_format_forecast.py to validate the formatting logic.
Run pytest tests/test_save_forecasts.py to validate the saving logic.

Checklist:

My code follows [OCF's coding style guidelines](https://github.com/openclimatefix/.github/blob/main/coding_style.md).
I have performed a self-review of my own code.
I have made corresponding changes to the documentation.
I have added tests that prove my fix is effective or that my feature works.
I have checked my code and corrected any misspellings.

siddharth7113 · 2024-12-07T13:42:53Z

Hi @peterdudfield ,

I wanted to let you know that I have tested the functionality locally by running the following commands:

pytest tests/test_format_forecast.py for the formatting logic.
pytest tests/test_save_forecasts.py for saving the forecasts to the database.

Both tests passed successfully, and I verified the database entries for accuracy. However, as this is my first time writing tests, I approached this with a lot of learning. I referenced the project's documentation, general testing guidelines, and external resources (like GPT) to ensure I followed best practices.

While I’ve made my best effort, I’m open to any feedback or suggestions for improvement. Please let me know if there are specific areas I should revisit or refine—I’d be happy to make the necessary adjustments.

Thank you for your guidance and support!

This reverts commit 8873244.

siddharth7113 · 2024-12-07T14:06:00Z

Hi @peterdudfield,
I wanted to clarify the commit history. During the development of this feature, I encountered issues with the CI configuration (pytest.yaml). I initially tried to fix it but made an invalid configuration, which required a revert. This created extra commits in the history.

I've resolved the issue now, and the CI pipeline should function as expected. Let me know if you'd like me to squash these commits to clean up the history.

peterdudfield · 2024-12-08T09:31:50Z

Hey this looks really great. Thanks so much for doing this. Some quick points

Could you add some requirements to the pyproject to get the ci working

In the format stage, I think we should just make 1 forecasts object, which had many Forecast Values. Sorry if it does that already

siddharth7113 · 2024-12-11T20:28:06Z

Hey @peterdudfield, thanks for the feedback and suggestions! I’m currently out of town for a few days and don’t have access to everything I need to make the updates. I’ll address this and update the PR in the next few days. Thanks for your patience!

siddharth7113 · 2024-12-17T06:47:42Z

Hi @peterdudfield

I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.

Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.

Steps Taken So Far:

I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.

I checked that the forecast_values list is correctly populated with all the rows.

Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.

Would you have any advice or suggestions for:

Avoiding duplicate ForecastSQL objects when using get_location?
Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?

Any help would be greatly appreciated!

peterdudfield · 2024-12-17T08:28:20Z

Hi @peterdudfield

I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.

Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.

Steps Taken So Far:

I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.

I checked that the forecast_values list is correctly populated with all the rows.

Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.

Would you have any advice or suggestions for:

Avoiding duplicate ForecastSQL objects when using get_location? Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?

Any help would be greatly appreciated!

Thats not a problem, just make one. The save method I think duplicated the one, so you might end up with two. Feel free to push code, and I can try to help

…t_location

siddharth7113 · 2024-12-17T09:56:59Z

Hi @peterdudfield
I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.
Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.
Steps Taken So Far:
I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.
I checked that the forecast_values list is correctly populated with all the rows.
Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.
Would you have any advice or suggestions for:
Avoiding duplicate ForecastSQL objects when using get_location? Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?
Any help would be greatly appreciated!

Thats not a problem, just make one. The save method I think duplicated the one, so you might end up with two. Feel free to push code, and I can try to help

I've updated both the format_forecast code and the corresponding test code.

However, the test code was initially written to test locally with a hardcoded database configuration (e.g., localhost). It’s not well-suited for CI as it currently relies on a local PostgreSQL setup. I’ve been unable to figure out how to rewrite it for CI yet.

Would it be okay to push the current changes as they are, and we can address the test improvements (to make them CI-ready) in a separate PR?

Let me know what you think!

neso_solar_consumer/format_forecast.py

tests/test_format_forecast.py

peterdudfield · 2024-12-17T10:22:40Z

Hi @peterdudfield
I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.
Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.
Steps Taken So Far:
I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.
I checked that the forecast_values list is correctly populated with all the rows.
Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.
Would you have any advice or suggestions for:
Avoiding duplicate ForecastSQL objects when using get_location? Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?
Any help would be greatly appreciated!

Thats not a problem, just make one. The save method I think duplicated the one, so you might end up with two. Feel free to push code, and I can try to help

I've updated both the format_forecast code and the corresponding test code.

However, the test code was initially written to test locally with a hardcoded database configuration (e.g., localhost). It’s not well-suited for CI as it currently relies on a local PostgreSQL setup. I’ve been unable to figure out how to rewrite it for CI yet.

Would it be okay to push the current changes as they are, and we can address the test improvements (to make them CI-ready) in a separate PR?

Let me know what you think!

Thanks so much for this, ive added a comment or two, that should actually get it working in CI. I reckon lets try and see if we can do it in this PR

…sability.

tests/test_fetch_data.py

peterdudfield · 2024-12-17T17:15:29Z

tests/conftest.py

+        Generator: A SQLAlchemy session object.
+    """
+    # Create database engine and tables
+    engine = create_engine(TEST_DB_URL)


I think you should still do with PostgresContainer("postgres:15.5") as postgres: see other examples here https://github.com/openclimatefix/pv-site-datamodel/blob/main/tests/conftest.py#L26

Hi @peterdudfield,I’m still looking into this and will make the necessary changes. I’m sorry if it’s taking too long—would it be okay to give me a few more days to finalize it? I’ll update you as soon as it’s done.

codecov · 2024-12-18T03:56:48Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

peterdudfield · 2024-12-18T11:27:52Z

Thanks for doing those changes, just one more test to fix. If I get time, ill try to look why its failing.

siddharth7113 · 2024-12-18T16:03:18Z

Thanks for doing those changes, just one more test to fix. If I get time, ill try to look why its failing.

No worries! I think I might have an idea why the tests are failing. It could be an issue both with the test I wrote and the code logic itself. I have my university exams in the next few days, so if it's okay with you, I'd like to revisit it afterward and make the necessary corrections.

Once again, thank you for your patience—I’ve learned a lot through this PR process!

peterdudfield · 2024-12-18T16:51:00Z

Thanks for doing those changes, just one more test to fix. If I get time, ill try to look why its failing.

No worries! I think I might have an idea why the tests are failing. It could be an issue both with the test I wrote and the code logic itself. I have my university exams in the next few days, so if it's okay with you, I'd like to revisit it afterward and make the necessary corrections.

Once again, thank you for your patience—I’ve learned a lot through this PR process!

no problem, good luck and take your time

siddharth7113 added 4 commits December 7, 2024 18:51

feat: add core forecast processing and saving logic

c8c292c

test: add tests for forecast data fetching and processing

c870fbf

chore: update pyproject.toml for dependencies

4b9aaa7

chore: apply Black and Ruff fixes

46d31aa

siddharth7113 added 2 commits December 7, 2024 19:21

fix: update CI workflow to install dev dependencies

8873244

Revert "fix: update CI workflow to install dev dependencies"

e5dbd02

This reverts commit 8873244.

alirashidAR mentioned this pull request Dec 10, 2024

Create dag airflow to run this every 30 mins openclimatefix/ocf-infrastructure#702

Open

fix: move sqlalchemy to default dependencies for CI

c2261f2

siddharth7113 added 3 commits December 17, 2024 15:02

Refactor format_to_forecast_sql: streamline location creation with ge…

9c933d2

…t_location

Clean and document test for format_to_forecast_sql function

89ab0bf

Fix missing dependency

77e17e0