Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potentially remove dropna in site find_valid_t0s #80

Open
AUdaltsova opened this issue Nov 8, 2024 · 0 comments
Open

potentially remove dropna in site find_valid_t0s #80

AUdaltsova opened this issue Nov 8, 2024 · 0 comments

Comments

@AUdaltsova
Copy link
Contributor

Missed this on merge, but re: comments here:

    # 2. Now lets loop over each location in system id and find the valid periods
    # Should we have a different option if there are not nans
    sites = datasets_dict["site"]
    site_ids = sites.site_id.values
    site_config = config.input_data.site
    valid_t0_and_site_ids = []
    for site_id in site_ids:
        site = sites.sel(site_id=site_id)


        # drop any nan values
        # not sure this is right?
        site = site.dropna(dim='time_utc')

I don't think we should be doing dropna here. This will make a block of dates with even 1 missing value discontinuous which I think is less beneficial than using it and filling in the missing timestamp later on (e g one missing point can cost you something like 45 potential t0s with 3h history and 8h forecast). I've used data with about 3% missing, sometimes in considerable chunks, and the model seemed to do fine and not get distracted by nan infills.

There is a greater discussion to be had around how much missing data we allow to be infilled and at what times, but I think this should be done in preprocessing anyway and not here; I'd remove it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant