-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meta Issue: Open Data PVnet #5
Comments
🌞 Open Source Solar Forecasting Project – Volunteers Needed! 🌞 We're building an open-source solar forecasting pipeline using publicly available data to predict solar generation at the national level, starting with the UK. Tasks include identifying gridded NWP datasets (preferably in Zarr format), creating pipelines for batching data, setting up APIs for PVlive solar generation and capacity data, defining training/testing splits, and benchmarking against existing OCF results. Roles range from data engineers and machine learning enthusiasts to software developers with Python expertise. If you're passionate about renewable energy and open-source collaboration, join us in advancing solar forecasting solutions for global impact! 🌍✨ #opensource #renewableenergy #solarforecasting |
@peterdudfield would this work for open source data to use? |
@jcamier that probably one option. Lots of other options like free ECWMF variables, GFS, ICON - https://huggingface.co/datasets/openclimatefix/dwd-icon-eu |
@peterdudfield @Sukh-P what do you guys think of this readme I am creating to help with on-boarding to this volunteer group project? |
Here is a preview of it... Solar Forecasting Volunteer OnboardingWelcome to the Solar Forecasting project! This document will introduce you to the key concepts and knowledge needed to contribute effectively. Table of Contents
Introduction to Solar ForecastingSolar forecasting is the process of predicting the amount of solar energy that will be generated over a specific period. Understanding this helps optimize renewable energy systems and integrate them with the grid. What is NWP Data?Numerical Weather Prediction (NWP) data uses mathematical models of the atmosphere and oceans to forecast weather. It predicts various atmospheric conditions such as temperature, pressure, wind speed, humidity, precipitation type and amount, cloud cover, and sometimes even surface conditions and air quality—all of which are crucial for solar forecasting. https://en.wikipedia.org/wiki/Numerical_weather_prediction Understanding Zarr Formatzarr is a relatively new, cloud-based data format designed to improve access to N-dimensional arrays. It provides an effective way to store large N-dimensional data in the cloud, with access facilitated through predefined chunks. Zarr can be viewed as the cloud-based counterpart to HDF5/NetCDF files, as it follows a similar data model. However, unlike NetCDF or HDF5, which store data in a single file, Zarr organizes data as a directory containing compressed binary files for chunks of data, alongside metadata stored in external JSON files. The semantic mapping from the NetCDF Data Model to the Zarr Data Model is as follows:
A Zarr array can be stored in any storage system that supports a key/value interface. In this system: A key is an ASCII string. Target Data: What is UK PVlive?UK PVlive provides national solar generation data, accessible via API. This data serves as a "ground truth" for training and evaluating solar forecasting models. Basics of Machine Learning for Solar ForecastingDiscover key ML concepts such as data splitting, feature engineering, and model evaluation, all tailored to the solar forecasting domain. APIs and Data RetrievalLearn how to use APIs to fetch solar generation data and capacity information, critical for building datasets. Data Pipelines for Solar ForecastingExplore how pipelines prepare and batch data for machine learning models, making training and testing efficient. Benchmarks and ComparisonsUnderstand the importance of benchmarking and how our models compare to existing solutions. Geographical AdaptabilityThis project isn't limited to the UK currently but will be expanded to other global regions in the future. Learn how it can be adapted to other regions and data sources. Key Tools and TechnologiesFamiliarize yourself with tools like Python, pandas, and open-source libraries like Common TerminologyLearn the meanings of key terms like Grid Supply Point (GSP), solar irradiance, and capacity factors. Expected Knowledge and SkillsAn overview of the skills contributors should have or be willing to learn, such as Python programming and data analysis. How This Project Fits into Renewable EnergyUnderstand the broader impact of this work and its contribution to a sustainable future. Thank you for joining us on this journey to advance solar forecasting and renewable energy solutions! |
Something like this would be really great. We might be able to put that in as the Github project home page. Let me try to make it now and give you access @jcamier |
Yea its possible to add a large readme to a project - https://github.com/orgs/openclimatefix/projects/36/views/1 |
@peterdudfield how do you want me to push a PR for the markdown for the onboarding? Can you add an items to the project, one of which is this markdown file? I could create a branch and push up a PR for it then? I would like to make it as easy as possible to give volunteers context about what we are doing to get them up-to-speed quickly and answer a lot of questions they may have to shorten the time in which they can be effective and able to contribute to the project. I am used to using Jira boards with epics, themes, stories and then using github and/or gitlab to create branches tied to the stories etc. I have not used Github projects before. Or do you want me to create a PR directly to PVNet for this? I am assuming this is a bit of an additional project at this point that we want to run in parallel to PVNet and then merge into it at a later point once it proves to improve or expand the core PVNet model? |
Hi @jcamier Ive invited you to OCF github, and then I should be able to give you write access to the project. This means you can then add the markdown file for the project. I would prefer we try Github Project rather than Jira e.t.c, as then its very close to the github issues. Yea its an interesting discussion of where we put code for this. We tried to keep PVNet mainly for ML work, so one idea could be to have a seperate repo for "Open Data PVnet". I would expect stage 1, that PVnet does not change much, its more about collect the right data and training the model. After that, we can defiantely try new features in PVNet. |
@peterdudfield I agree. Maybe we create a separate repo which is clone/fork of PVNet which we call Also, should we start working with |
Thanks @jcamier Yea, thats a good one to discuss, where we make a fork of PVNet for this project. Probably a good idea For Github projects, yea we can have lables e.t.c. Im trying to give you write access to the project, so you can edit as appropriate |
I have had a go at revising this slightly:
|
On the task list above I feel for this one:
Finding gridded NWP data that is already in a zarr format may be tricky, I think more likely they will be in other formats such as GRIB and would need to be converted into zarrs, could leverage tools such as the nwp-consumer OCF has for work like this |
@jcamier Yes I think the key focus in the beginning is going to be the data engineering aspect, finding the appropriate open NWP data sources, either downloading these into the preferred zarr format or having the right tools to be able to stream this data at a good pace. After that then will come using ocf-data-sampler to create samples for ML models from this data, which I imagine will be less work than the first task of just getting the data in the right places in the right formats and having the right tools to work with them. |
🌞 Open Source Solar Forecasting Project – Volunteers Needed! 🌞 We're building an open-source solar forecasting pipeline using publicly available data to predict solar generation at the national level, starting with the UK. Tasks include identifying gridded Numerical Weather Prediction datasets, downloading this NWP data and transforming it into the preferred Zarr format, acquiring solar generation target data through APIs such as PVlive's solar generation and capacity API, creating pipelines for batching data and ML model experimentation We want to start in the UK, in order to benchmark with OCF results, and then lets expanding to lots of other countries. Roles range from data engineers and machine learning enthusiasts to software developers with Python expertise. If you're passionate about renewable energy and open-source collaboration, join us in advancing solar forecasting solutions for global impact! 🌍✨ #opensource #renewableenergy #solarforecasting |
@peterdudfield after thinking it over some more, I propose we don't include the architecture overview image in our readme for the time being. Maybe after we have a few more passes at it, we can include it then? However, I am including it here as an artifact we can reference later. And here is the miro board link as well: https://miro.com/app/board/uXjVL2Ugbq8=/ |
Is ti worth putting this on the readme @jcamier ? |
@peterdudfield this is your call. It is not the most professional looking artifact (good enough for internal purposes though 😄 ) So, up-to-you if you think this would be helpful. I was envisioning sharing this with the open source team during a weekly standup call we could have in the future... |
The idea is to make sure PVnet is accessible and usable for Open source user and contributors.
Current problems are lots of the NWP data is private.
Other context, we are moving over from ocf-datapipes to ocf-datasampler, so I would vote we try to use ocf-datasample at all points.
Here's a rough list of task lists that need
If all these steps are complete, then it will be ready to use for different countries and different geographies
The text was updated successfully, but these errors were encountered: