Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting and Saving Forecast Data with Tests #24

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

siddharth7113
Copy link
Contributor

Pull Request

Description

This pull request addresses the functionality required for:

  • Formatting NESO solar forecast data into OCF-compatible Forecast objects.
  • Saving formatted forecasts to the database.

It tackles the following issues:

Key changes include:

  • Added format_to_forecast_sql to transform fetched forecast data into ForecastSQL objects.

  • Implemented save_forecasts_to_db to persist these formatted forecasts in the database.

  • Applied code formatting (Black) and addressed all linting issues (Ruff).

How Has This Been Tested?

Tests were added for both:

  1. Formatting function (format_to_forecast_sql).
  2. Saving function (save_forecasts_to_db).

The tests verify:

  • The correctness of data transformation and structure.
  • The ability to save real forecast data into the database.

To reproduce:

  1. Run pytest tests/test_format_forecast.py to validate the formatting logic.
  2. Run pytest tests/test_save_forecasts.py to validate the saving logic.

Checklist:

@siddharth7113
Copy link
Contributor Author

Hi @peterdudfield ,

I wanted to let you know that I have tested the functionality locally by running the following commands:

  • pytest tests/test_format_forecast.py for the formatting logic.
  • pytest tests/test_save_forecasts.py for saving the forecasts to the database.

Both tests passed successfully, and I verified the database entries for accuracy. However, as this is my first time writing tests, I approached this with a lot of learning. I referenced the project's documentation, general testing guidelines, and external resources (like GPT) to ensure I followed best practices.

While I’ve made my best effort, I’m open to any feedback or suggestions for improvement. Please let me know if there are specific areas I should revisit or refine—I’d be happy to make the necessary adjustments.

Thank you for your guidance and support!

@siddharth7113
Copy link
Contributor Author

Hi @peterdudfield,
I wanted to clarify the commit history. During the development of this feature, I encountered issues with the CI configuration (pytest.yaml). I initially tried to fix it but made an invalid configuration, which required a revert. This created extra commits in the history.

I've resolved the issue now, and the CI pipeline should function as expected. Let me know if you'd like me to squash these commits to clean up the history.

@peterdudfield
Copy link
Contributor

peterdudfield commented Dec 8, 2024

Hey this looks really great. Thanks so much for doing this. Some quick points

Could you add some requirements to the pyproject to get the ci working

In the format stage, I think we should just make 1 forecasts object, which had many Forecast Values. Sorry if it does that already

@siddharth7113
Copy link
Contributor Author

Hey @peterdudfield, thanks for the feedback and suggestions! I’m currently out of town for a few days and don’t have access to everything I need to make the updates. I’ll address this and update the PR in the next few days. Thanks for your patience!

@siddharth7113
Copy link
Contributor Author

siddharth7113 commented Dec 17, 2024

Hi @peterdudfield

I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.

Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.

Steps Taken So Far:

I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.

I checked that the forecast_values list is correctly populated with all the rows.

Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.

Would you have any advice or suggestions for:

Avoiding duplicate ForecastSQL objects when using get_location?
Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?

Any help would be greatly appreciated!

@peterdudfield
Copy link
Contributor

Hi @peterdudfield

I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.

Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.

Steps Taken So Far:

I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.

I checked that the forecast_values list is correctly populated with all the rows.

Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.

Would you have any advice or suggestions for:

Avoiding duplicate ForecastSQL objects when using get_location? Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?

Any help would be greatly appreciated!

Thats not a problem, just make one. The save method I think duplicated the one, so you might end up with two. Feel free to push code, and I can try to help

@siddharth7113
Copy link
Contributor Author

Hi @peterdudfield
I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.
Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.
Steps Taken So Far:
I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.
I checked that the forecast_values list is correctly populated with all the rows.
Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.
Would you have any advice or suggestions for:
Avoiding duplicate ForecastSQL objects when using get_location? Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?
Any help would be greatly appreciated!

Thats not a problem, just make one. The save method I think duplicated the one, so you might end up with two. Feel free to push code, and I can try to help

I've updated both the format_forecast code and the corresponding test code.

However, the test code was initially written to test locally with a hardcoded database configuration (e.g., localhost). It’s not well-suited for CI as it currently relies on a local PostgreSQL setup. I’ve been unable to figure out how to rewrite it for CI yet.

Would it be okay to push the current changes as they are, and we can address the test improvements (to make them CI-ready) in a separate PR?

Let me know what you think!

@peterdudfield
Copy link
Contributor

Hi @peterdudfield
I’ve been working on ensuring that the formatting stage creates a singleForecastSQL object containing multiple ForecastValue entries.
Currently, I’m encountering an issue where multiple ForecastSQL objects are being created instead of just one. The problem seems to stem from the get_location function, which sometimes results in duplicate LocationSQL entries being added to the session, even though the gsp_id is the same. Since get_location is part of the library, I’m unable to modify it directly.
Steps Taken So Far:
I ensured get_location is called only once for gsp_id=0 before creating the ForecastSQL object.
I checked that the forecast_values list is correctly populated with all the rows.
Despite these changes, when I add the ForecastSQL object to the session, I see multiple entries.
Would you have any advice or suggestions for:
Avoiding duplicate ForecastSQL objects when using get_location? Ensuring that ForecastSQL interacts cleanly with the session to maintain uniqueness?
Any help would be greatly appreciated!

Thats not a problem, just make one. The save method I think duplicated the one, so you might end up with two. Feel free to push code, and I can try to help

I've updated both the format_forecast code and the corresponding test code.

However, the test code was initially written to test locally with a hardcoded database configuration (e.g., localhost). It’s not well-suited for CI as it currently relies on a local PostgreSQL setup. I’ve been unable to figure out how to rewrite it for CI yet.

Would it be okay to push the current changes as they are, and we can address the test improvements (to make them CI-ready) in a separate PR?

Let me know what you think!

Thanks so much for this, ive added a comment or two, that should actually get it working in CI. I reckon lets try and see if we can do it in this PR

Generator: A SQLAlchemy session object.
"""
# Create database engine and tables
engine = create_engine(TEST_DB_URL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should still do with PostgresContainer("postgres:15.5") as postgres: see other examples here https://github.com/openclimatefix/pv-site-datamodel/blob/main/tests/conftest.py#L26

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @peterdudfield,I’m still looking into this and will make the necessary changes. I’m sorry if it’s taking too long—would it be okay to give me a few more days to finalize it? I’ll update you as soon as it’s done.

Copy link

codecov bot commented Dec 18, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@peterdudfield
Copy link
Contributor

Thanks for doing those changes, just one more test to fix. If I get time, ill try to look why its failing.

@siddharth7113
Copy link
Contributor Author

Thanks for doing those changes, just one more test to fix. If I get time, ill try to look why its failing.

No worries! I think I might have an idea why the tests are failing. It could be an issue both with the test I wrote and the code logic itself. I have my university exams in the next few days, so if it's okay with you, I'd like to revisit it afterward and make the necessary corrections.

Once again, thank you for your patience—I’ve learned a lot through this PR process!

@peterdudfield
Copy link
Contributor

Thanks for doing those changes, just one more test to fix. If I get time, ill try to look why its failing.

No worries! I think I might have an idea why the tests are failing. It could be an issue both with the test I wrote and the code logic itself. I have my university exams in the next few days, so if it's okay with you, I'd like to revisit it afterward and make the necessary corrections.

Once again, thank you for your patience—I’ve learned a lot through this PR process!

no problem, good luck and take your time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

save format into Forecast object
2 participants