Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add road weather station parser #528

Closed
wants to merge 20 commits into from
Closed

WIP: Add road weather station parser #528

wants to merge 20 commits into from

Conversation

meteoDaniel
Copy link
Contributor

@meteoDaniel meteoDaniel commented Oct 17, 2021

Dear @gutzbenj and @amotl this is a first draft of road weather download and parsing for issue #518.

Here are some notes:

  1. pdbufr and libeccodes is required
  2. The directory short names on the server for the different groups does not make sense to me. I do not know what they are standing for.
  3. You can see the debug code in api.py.
  4. The main issue at the moment is the return type of the parser. I have implemented the ability to pass a list of subgroups. I have to find a way to merge the different dataframes. The columns should be all the same differing only
  5. Only the latest file is downloaded at the moment. Unfortunately pdbufr does not work with in memory bytes file, this is why I have to use a temporary stored file.
  6. Do you know how to link this to the issue and mark this as a draft?
  7. The ThreadPoolExecutor is not a real parallel process, it uses threading. You have to use ProcessPoolExecutor to spread the task over CPU cores.

I am looking forward to your first thoughts and feedback.

@codecov
Copy link

codecov bot commented Oct 17, 2021

Codecov Report

Merging #528 (e4bc391) into main (48fc337) will decrease coverage by 3.07%.
The diff coverage is 88.30%.

❗ Current head e4bc391 differs from pull request most recent head 08b55b0. Consider uploading reports for the commit 08b55b0 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #528      +/-   ##
==========================================
- Coverage   90.96%   87.88%   -3.08%     
==========================================
  Files          85       99      +14     
  Lines        5255     5958     +703     
  Branches      441      484      +43     
==========================================
+ Hits         4780     5236     +456     
- Misses        368      605     +237     
- Partials      107      117      +10     
Flag Coverage Δ
unittests 87.88% <88.30%> (-3.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...enst/provider/dwd/forecast/metadata/field_types.py 100.00% <ø> (ø)
wetterdienst/provider/dwd/observation/fields.py 94.73% <ø> (ø)
...t/provider/dwd/observation/metadata/field_types.py 100.00% <ø> (ø)
wetterdienst/provider/dwd/radar/sites.py 100.00% <ø> (ø)
wetterdienst/provider/dwd/road_weather/__init__.py 0.00% <0.00%> (ø)
wetterdienst/provider/dwd/road_weather/api.py 0.00% <0.00%> (ø)
wetterdienst/provider/dwd/road_weather/download.py 0.00% <0.00%> (ø)
...etterdienst/provider/dwd/road_weather/fileindex.py 0.00% <0.00%> (ø)
...nst/provider/dwd/road_weather/metadata/__init__.py 0.00% <0.00%> (ø)
...enst/provider/dwd/road_weather/metadata/dataset.py 0.00% <0.00%> (ø)
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3fed23...08b55b0. Read the comment docs.

@amotl
Copy link
Member

amotl commented Oct 17, 2021

Hi Daniel,

nice work! I specifically like that everything is contained within a single package namespace, wetterdienst.provider.dwd.road_weather.

With kind regards,
Andreas.

@gutzbenj
Copy link
Member

Dear @meteoDaniel ,

thanks for working on this data integration!

Just some notes:

  • ProcessPoolExecutor was not working when I was using Windows. Windows in comparison to Mac and Linux uses another technique to spawn multiple tasks. I couldn't get behind this issue which is why I removed the multiprocessing feature.
  • pdbufr: as I had already tried to integrate pdbufr into the lib with Use pdbufr to read DWD radar data in bufr format #482 I had gone into some difficulty where eccodes wasn't aligned to the python interface via pdbufr. I don't know if this has changed within latest updates but I will check it once again.

I will try out the new api one of the following days.

Thanks so far.

Cheers,
Benjamin

@meteoDaniel
Copy link
Contributor Author

meteoDaniel commented Nov 4, 2021

First of all I am still confused because I had to remove some caches before I was able to run my code again. Did not know where these changes come from.

Nevertheless I added a tiny function to generalize the dictionaries into a dataframe.

Next issue:
Some of the variables require additional information like timePeriod and the height of the sensor.
Actually the columns looking like that:

In [3]: data.columns
Out[3]: 
Index(['positionOfRoadSensors', 'roadSurfaceCondition', 'timePeriod',
       'waterFilmThickness', 'timePeriod',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform',
       'airTemperature',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform',
       'dewpointTemperature',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform',
       'relativeHumidity',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform', 'windDirection',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform', 'windSpeed',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform', 'timePeriod',
       'maximumWindGustSpeed',
       'heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform', 'timePeriod',
       'maximumWindGustDirection', 'timePeriod', 'precipitationType',
       'timePeriod', 'totalPrecipitationOrTotalWaterEquivalent', 'timePeriod',
       'intensityOfPrecipitation', 'timePeriod', 'intensityOfPhenomena',
       'horizontalVisibility'],
      dtype='object')

My ide is to add that information to the metadata because I think these information are constant over time.

What do you think ?

Best regards from holidays in porto by the way :P

@meteoDaniel
Copy link
Contributor Author

I will try to add the meta information into the meta dataframe for the next step.

@gutzbenj gutzbenj marked this pull request as draft November 30, 2021 09:34
@gutzbenj
Copy link
Member

gutzbenj commented Dec 1, 2021

edited

Dear @meteoDaniel ,

for the correct usage of pdbufr we have to find a way to defined ECCODES_DIR for the different possible installation paths. I'm not yet sure if there's a way to do that but probably best would be to ask for ECCODES_DIR environment variable.

Otherwise I've been googling a bit about the road weather data and found not too much BUT
at the general help page [1] (which btw has some helpful documents) I found a xlsx document [2] which seems to contain the list of stations of the road weather dataset! So we may use this xlsx to get the listing, then from this listing take the group of road weather which we have to acquire to return the data.

[1] https://www.dwd.de/DE/leistungen/opendata/hilfe.html
[2] https://www.dwd.de/DE/leistungen/opendata/help/stationen/sws_stations_xls.xlsx?__blob=publicationFile&v=11

@meteoDaniel
Copy link
Contributor Author

Dear @gutzbenj ,I will go on with this task.

Your suggestion with the file list is very good. thanks for the effort to check this out.

The ECCODES_DIR env could be set via a specific Docker image and we will add another part in the documentation to explain what to do to use this new API.

@meteoDaniel
Copy link
Contributor Author

Dear @gutzbenj and @amotl ,

today I was able to work on that project again. As @gutzbenj mentioned, I have implemented the download of the stations lists to create a metaindex within the class.

Please tell me whats next ;)

Actually I would like to use the ScalarRequestCore do you think this is doable? And if yes, is it possible that some of you give me a short list of stuff I have to take care and to check ?

Other todo's:

  • Adding a BUFR column name mapping to the wetterdienst human readable column names

@meteoDaniel
Copy link
Contributor Author

meteoDaniel commented Jan 22, 2022

Here is the Dockerfile i used to create an image with eccodes

FROM python:3.8.6-slim

ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

RUN set -ex \
    && apt update \
    && apt install -y apt-transport-https curl git libeccodes0 wget

RUN apt-get -y dist-upgrade && apt-get update
RUN wget https://cmake.org/files/v3.20/cmake-3.20.1.tar.gz
RUN tar -xzvf cmake-3.20.1.tar.gz
WORKDIR cmake-3.20.1/
RUN apt-get update && apt-get install --fix-missing -yqq libssl-dev build-essential
RUN ./bootstrap
RUN make -j$(nproc) && make install

WORKDIR /
RUN apt-get install -y gfortran protobuf-compiler
RUN curl https://confluence.ecmwf.int/download/attachments/45757960/eccodes-2.21.0-Source.tar.gz?api=v2 -o /tmp/eccodes-2.21.0-Source.tar.gz
RUN tar -xzf /tmp/eccodes-2.21.0-Source.tar.gz
RUN ls -lrt /
RUN mkdir build \
    && cd build \
    && cmake /eccodes-2.21.0-Source -DCMAKE_INSTALL_PREFIX=/usr/local/share/eccodes \
    && make install

RUN apt-get -yqq install autoconf libtool musl-dev
RUN ln -s /usr/lib/x86_64-linux-musl/libc.so /lib/libc.musl-x86_64.so.1

RUN curl https://opendata.dwd.de/weather/lib/bufr/bufrtables_ecCodes-local-dwd.tar.bz2 -o /tmp/bufrtables_ecCodes-local-dwd.tar.bz2
RUN tar -xjf /tmp/bufrtables_ecCodes-local-dwd.tar.bz2

COPY ./requirements.txt /opt/requirements.txt
RUN pip3 install -r /opt/requirements.txt

ENV PYTHONPATH /app
ENV ECCODES_DEFINITION_PATH /usr/local/share/eccodes/share/eccodes/definitions
ENV ECCODES_DIR /usr/local/share/eccodes
ENV BUFR_DUMP_PATH /usr/local/share/eccodes/bin/bufr_dump

WORKDIR /app

@gutzbenj
Copy link
Member

gutzbenj commented Jan 28, 2022

Dear @meteoDaniel ,

your totally fine using the ScalarRequestCore.

  • Ideally you should only have to set certain properties in the beginning and implement the ._all() method that returns a cached list of stations. Obviously those header of this stations listing should be aligned with the set of Columns we have defined. Besides that the most important step is to setup Enumerations for Parameters and and their Units.

  • Besides if the Parameters are nested e.g. there are different resolutions with different parameters, then you'd have to create another mapping between "flattened" parameters and their origin in the "deeper" enumeration.

    e.g.

    class DwdRoadWeatherTree:
          class Daily:
                  class XYZ:
                         Precipitation = "precp"
    
    class DwdRoadWeatherParameter:
          class Daily:
                 Precipitation = DwdRoadWeatherTree.Daily.XYZ.Precipitation.value

    At the moment, you then have to create another mapping like:

    mapping = {
        DwdRoadWeatherParameter.Daily.Precipitation: DwdRoadWeatherTree.Daily.XYZ
    }
  • Unit has to follow the detailed parameter tree

Is there a certain problem with implementing the class? I hope that I have annotated everything with enough text?

Cheers
Benjamin

@gutzbenj
Copy link
Member

gutzbenj commented Feb 2, 2022

Dear @meteoDaniel ,

you can now simply embed ECCODES_DIR into wetterdienst Settings. Should we together walk through the creation of a new API? This would be a good point for me to write down some documentation on how to approach that.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants