Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDBC data #137

Closed
saeed-moghimi-noaa opened this issue May 8, 2024 · 12 comments · Fixed by #146
Closed

NDBC data #137

saeed-moghimi-noaa opened this issue May 8, 2024 · 12 comments · Fixed by #146

Comments

@saeed-moghimi-noaa
Copy link

https://www.ndbc.noaa.gov/
ndbcheader

See an example here:
https://github.com/saeed-moghimi-noaa/prep_obs_ca

# Line 250
#coops_ndbc_obs_collector.py

#################
@retry(stop_max_attempt_number=5, wait_fixed=3000)
def get_ndbc(start, end, bbox , sos_name='waves',datum='MSL', verbose=True):
    """
    function to read NBDC data
    ###################
    sos_name = waves    
    all_col = (['station_id', 'sensor_id', 'latitude (degree)', 'longitude (degree)',
           'date_time', 'sea_surface_wave_significant_height (m)',
           'sea_surface_wave_peak_period (s)', 'sea_surface_wave_mean_period (s)',
           'sea_surface_swell_wave_significant_height (m)',
           'sea_surface_swell_wave_period (s)',
           'sea_surface_wind_wave_significant_height (m)',
           'sea_surface_wind_wave_period (s)', 'sea_water_temperature (c)',
           'sea_surface_wave_to_direction (degree)',
           'sea_surface_swell_wave_to_direction (degree)',
           'sea_surface_wind_wave_to_direction (degree)',
           'number_of_frequencies (count)', 'center_frequencies (Hz)',
           'bandwidths (Hz)', 'spectral_energy (m**2/Hz)',
           'mean_wave_direction (degree)', 'principal_wave_direction (degree)',
           'polar_coordinate_r1 (1)', 'polar_coordinate_r2 (1)',
           'calculation_method', 'sampling_rate (Hz)', 'name'])
    
    sos_name = winds    

    all_col = (['station_id', 'sensor_id', 'latitude (degree)', 'longitude (degree)',
       'date_time', 'depth (m)', 'wind_from_direction (degree)',
       'wind_speed (m/s)', 'wind_speed_of_gust (m/s)',
       'upward_air_velocity (m/s)', 'name'])
@saeed-moghimi-noaa
Copy link
Author

@saeed-moghimi-noaa
Copy link
Author

@AliS-Noaa @aliabdolali

What are your preferred web api to download NDBC data?

Thanks

@AliS-Noaa
Copy link

AliS-Noaa commented May 8, 2024 via email

@SorooshMani-NOAA
Copy link
Contributor

From a correspondence with one of our colleagues:

Near-real-time observations from NWS Fixed Buoys and NWS C-MAN Stations and from many ROOA operated buoys and coastal stations are available on the ndbc.noaa.gov web site.
I don't know if NDBC has an API yet, but one can obtain their obs via HTTPS or DODS/OPeNDAP https://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf
However, I found that someone has written 'ndbc-api' to "parse whitespace-delimited oceanographic and atmospheric data distributed as text files for available time ranges, on a station-by-station basis" (https://pypi.org/project/ndbc-api/). I also found ndbc.py at https://pypi.org/project/NDBC/. I imagine there are many others out there.

@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Jun 5, 2024

During our meeting on June 5th we discussed the following items/tasks related to NDBC data:

  • Using ndbc-api pacakge, an alternative package or write from sctach
    • For now let's continue with a third-party package, resolve following issues
  • Possible issues with thirdparty package license
    • ndbc-api is MIT if we end up using it
  • Is the third-party package already on conda-forge or is there a plan for it to be?
    • If not, explore other NDBC packages
    • If we help adding the Conda package will the original developer maintain?
    • Should we go ahead and just create a conda packages and maintain it? (ideally not)
  • Is raw data available (NDBC itself seems to have QC)
  • The issue of fetching station data one by one.

Todo:

  • @abdu558 to start a PR when the code is ready
  • @abdu558 to contact the third-party pacakge developer and ask about "conda"-related questions Ans: they are open creating and maintaining conda package
  • @SorooshMani-NOAA to contact NDBC about raw data [email protected]
  • @abdu558 to check if the web API provides the capability of multistation data or not: Ans: upstream package uses plain for loop for multiple stations

@SorooshMani-NOAA
Copy link
Contributor

Hi @pmav99 today we discussed @abdu558's NDBC implementation. I suggested that he implements everything based on the "new" API (as in #125), but instead of using the _ndbc_api.py as the file name, just use ndbc.py. What do you think?

Also we discussed whether to combine all data into a single dataframe or not and whether to keep the missing value columns, etc. I suggested discussing those in the next group meeting next week.

@abdu558, can you please summarize your questions here as well so that we can discuss them more constructively next week?

@SorooshMani-NOAA
Copy link
Contributor

@abdu558, I forgot to ask, what is the state of conda package for ndbc-api? You said they are open to creating the conda package themselves, right?

@abdu558
Copy link
Contributor

abdu558 commented Jun 12, 2024

Yea they did create it and said it would take a few days ish for it to show upScreenshot_20240612_160040_GitHub.jpg

@SorooshMani-NOAA
Copy link
Contributor

Response from NDBC:

[...] We do not have an API though we are hopeful to develop one in the future.

Our FAQs might be a good place to start with your quality control questions: https://www.ndbc.noaa.gov/faq/

@tomsail
Copy link
Contributor

tomsail commented Jun 13, 2024

Response from NDBC:

[...] We do not have an API though we are hopeful to develop one in the future.
Our FAQs might be a good place to start with your quality control questions: https://www.ndbc.noaa.gov/faq/

Thanks Soroosh. more on QC here: https://www.ndbc.noaa.gov/faq/qc.shtml
There is an exhaustive guide on the QC methodology (2009 version) and all the QC flags summarized in APPENDIX E.

@abdu558
Copy link
Contributor

abdu558 commented Jun 15, 2024

Hi @pmav99 today we discussed @abdu558's NDBC implementation. I suggested that he implements everything based on the "new" API (as in #125), but instead of using the _ndbc_api.py as the file name, just use ndbc.py. What do you think?

Also we discussed whether to combine all data into a single dataframe or not and whether to keep the missing value columns, etc. I suggested discussing those in the next group meeting next week.

@abdu558, can you please summarize your questions here as well so that we can discuss them more constructively next week?

You answered most of them but one that im not 100% sure of is if when multiple stations:

  1. an extra column is added called station id and the data of the different stations are combined to a single data frame

  2. outputs a dictionary which maps each id -> a dataframe of the stations data

this is the one that im not 100% sure of

@pmav99
Copy link
Member

pmav99 commented Jun 17, 2024

@abdu558 different providers return different data. For example, when you try to retrieve data from a bunch of IOC stations you will end up with dataframes with different number of columns and different column names. E.g.

https://www.ioc-sealevelmonitoring.org/bgraph.php?code=aden&output=tab&period=0.5&endtime=2018-06-07
https://www.ioc-sealevelmonitoring.org/bgraph.php?code=abed&output=tab&period=0.5&endtime=2018-06-07

Merging these will result in with a bunch of columns with NaNs. This is problematic because NaNs are floats and consume quite a bit of RAM. If you are retrieving hundreds/thousands of stations for many years this can quickly become problematic

Furthermore, since you can't really know which column will have data for each station, you will end up calling .dropna() for every station id you want to process. Which can also be problematic, because the provider might return NaNs anyhow and you might want to differentiate between those.

Alternatively, you can just avoid merging in the first place. If somebody wants to merge the dictionary it is trivial to do so. E.g.:

data = {
    "st1": pd.DataFrame(index=["2020", "2021"], data={"var1": [111, 222]}),
    "st2": pd.DataFrame(index=["2021", "2022", "2023"], data={"var2": [1, 2, 3], "var3": [0, float("nan"), float("nan")]}),
}
merged = pd.concat(data, names=["station_id", "time"]).reset_index(level=0)

print(data)
print(merged)

@SorooshMani-NOAA SorooshMani-NOAA changed the title NDBC data GSoC-2024: NDBC data Jun 18, 2024
@SorooshMani-NOAA SorooshMani-NOAA changed the title GSoC-2024: NDBC data NDBC data Jun 18, 2024
@SorooshMani-NOAA SorooshMani-NOAA linked a pull request Aug 26, 2024 that will close this issue
@pmav99 pmav99 unpinned this issue Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants