PV internal ID (metadata) #691

peterdudfield · 2022-06-28T11:21:13Z

We need the followin metadata for both training and predictions of the ML data

ID, a unique number (across of supplies), starting from 0 and going up. The unique numbers from the pv data provider is too large, and might not be unique across of pv data supplier.
the capacity of the system.

Essenitall we need a map from provider id and provider too ocf_id and capacity

We could do this in a number of different ways

1. Database + API

Add ocf_id as a column to the database (this could go in the pv_system table). We would need to make sure that the development and production databases have the same value. Then we could add an endpoint to the API, where (provider, provider_id) --> (ocf_id, capacity). This information is not publicity sensitive, so we are ok with security here.

We perhaps would want a wrapper function around the api endpoint that we can easily use (this could be in nowcasting_dataset)

This solution is good because

it is scalable.
The values will be hard to change

This solution is bad because:

you need access to the internet.
need to think a bit about security of API

2. CSV

We could add a CSV to either nowcasting_dataset, or pv consumer with the following 4 rows: (provider, id, ocf_id, capacity). Then we could write a function to to go from (provider, provider_id) --> (ocf_id, capacity) very easily. I think this all sits quite well in nowcasting_dataset.

This is good because:

simple
the CSV can be version controlled.

This solution is bad because:

Needs access to where this is saved,

3. Cloud CSV

Like 2, but the CSV could be in the cloud. This means we don't have to worry about installing any extra repos.

This is good because:

don't have to instal on repo

This solution is bad because:

need access to the internet
need access to files

4. Hybrid

add ocf_id to database
add enpoint to api
add wrapper function to end point api, but saves file locally (somewhere), so that if there's no internet, it just loads the last one.

links:

https://www.quora.com/When-should-you-use-a-CSV-file-over-a-database

The text was updated successfully, but these errors were encountered:

peterdudfield · 2022-06-28T14:48:56Z

Would be interesting to get you views on this @JackKelly and @jacobbieker?

My feeling would be to start with simple i.e 2. Then, when we need to move this to the database (1.)

jacobbieker · 2022-06-28T15:25:29Z

Yeah, I think easy first and then building up as we need it. I'd second your plan.

JackKelly · 2022-06-28T18:26:48Z

Agreed! A CSV sounds good to me! Thanks for thinking about the alternatives!

peterdudfield added enhancement New feature or request and removed enhancement New feature or request labels Jun 28, 2022

peterdudfield mentioned this issue Jun 28, 2022

Which Capacity should we use? #692

Open

peterdudfield mentioned this issue Jul 7, 2022

Issue/ml capacity openclimatefix/nowcasting_datamodel#84

Merged

7 tasks

peterdudfield mentioned this issue Aug 10, 2023

Fix PV system IDs openclimatefix/ocf_datapipes#220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PV internal ID (metadata) #691

PV internal ID (metadata) #691

peterdudfield commented Jun 28, 2022 •

edited

Loading

peterdudfield commented Jun 28, 2022

jacobbieker commented Jun 28, 2022

JackKelly commented Jun 28, 2022

PV internal ID (metadata) #691

PV internal ID (metadata) #691

Comments

peterdudfield commented Jun 28, 2022 • edited Loading

1. Database + API

2. CSV

3. Cloud CSV

4. Hybrid

peterdudfield commented Jun 28, 2022

jacobbieker commented Jun 28, 2022

JackKelly commented Jun 28, 2022

peterdudfield commented Jun 28, 2022 •

edited

Loading