Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding git paths to inputs #9

Merged
merged 12 commits into from
Jun 14, 2024
Merged

adding git paths to inputs #9

merged 12 commits into from
Jun 14, 2024

Conversation

PaoloBonettiPolimi
Copy link
Collaborator

Overview

This PR fixes the access of the prototype to inputs, that are now all zipped and accessed through git.

@PaoloBonettiPolimi
Copy link
Collaborator Author

@cehbrecht let me know if this fixes the issue in the demo when you deploy it.

@cehbrecht
Copy link
Contributor

cehbrecht commented May 16, 2024

@PaoloBonettiPolimi the fix might work ... but there is another issue:

2024-05-16 11:36:33.462963: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/usr/local/anaconda/envs/shearwater/lib/python3.11/site-packages/pywps/app/Process.py", line 260, in _run_process
    self.handler(wps_request, wps_response)  # the user must update the wps_response.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda/envs/shearwater/lib/python3.11/site-packages/shearwater/processes/wps_cyclone.py", line 156, in _handler
    model_trained = models.load_model("./Unet_sevenAreas_fullStd_0lag_model.keras")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda/envs/shearwater/lib/python3.11/site-packages/keras/src/saving/saving_api.py", line 187, in load_model
    raise ValueError(
ValueError: File not found: filepath=./Unet_sevenAreas_fullStd_0lag_model.keras. Please ensure the file is an accessible `.keras` zip file.

You need to use self.workdir instead of . to access the files in the temporary folder created for the process.

See:

https://github.com/bird-house/emu/blob/c3b50bbf54b1a4536f7faa234daffc6f7edab115/emu/processes/wps_ncml.py#L49

@PaoloBonettiPolimi
Copy link
Collaborator Author

@cehbrecht I've now used self.workdir, let me know if it is ok now.

@cehbrecht
Copy link
Contributor

@PaoloBonettiPolimi I have run it again ... still I get an error ... but I don't understand yet why ???

Here is the error:

2024-05-16 15:46:19.483225: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/usr/local/anaconda/envs/shearwater/lib/python3.11/site-packages/pywps/app/Process.py", line 260, in _run_process
    self.handler(wps_request, wps_response)  # the user must update the wps_response.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda/envs/shearwater/lib/python3.11/site-packages/shearwater/processes/wps_cyclone.py", line 158, in _handler
    model_trained = models.load_model(model_path)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda/envs/shearwater/lib/python3.11/site-packages/keras/src/saving/saving_api.py", line 187, in load_model
    raise ValueError(
ValueError: File not found: filepath=/var/lib/pywps/tmp/shearwater/pywps_process_tcf4_m0h/Unet_sevenAreas_fullStd_0lag_model.keras. Please ensure the file is an accessible `.keras` zip file.

... but the file is available:

[root@shearwater1 pywps_process_tcf4_m0h]# ls -l /var/lib/pywps/tmp/shearwater/pywps_process_tcf4_m0h/Unet_sevenAreas_fullStd_0lag_model.keras
-rw-r--r--. 1 wps wps 23534480 May 16 15:46 /var/lib/pywps/tmp/shearwater/pywps_process_tcf4_m0h/Unet_sevenAreas_fullStd_0lag_model.keras

@PaoloBonettiPolimi
Copy link
Collaborator Author

PaoloBonettiPolimi commented May 16, 2024

@cehbrecht uhm.. it works if I start the bird locally and use birdy in a notebook:

Screenshot 2024-05-16 alle 16 34 08

do you have any idea on this issue? A possible workaround (not much elegant) may be to put the model on a static directory of the virtual machine you are using to deploy the bird, if that is accessible by the process when it is deployed, and directly insert the local path?

@cehbrecht
Copy link
Contributor

@PaoloBonettiPolimi I have looked at the issue again. The module load is expecting a "keras zip file" ... but the file seems to be a HDF file ... same as the one on github:
https://github.com/climateintelligence/shearwater/blob/main/data/Unet_sevenAreas_fullStd_0lag_model.keras

@PaoloBonettiPolimi
Copy link
Collaborator Author

@cehbrecht the file on github is exactly the file that I'm loading in the last commit, after retrieving it from github to the local directory of the prototype (which works if I test it on Levante). Do you have any idea of the reason why if looks like a HDF file to it?

models.load_model should also work with h5 files, so the only two changes that come to my mind are to try to put the keras file directly on the virtual machine you are using, so that we can use the load_model without previously copying the .keras file from git, or to try to change the extension of the file from .keras to .h5, although I've never used it.

@cehbrecht
Copy link
Contributor

cehbrecht commented Jun 6, 2024

Hi @PaoloBonettiPolimi ... I have looked a bit more at it. It seems to be that the version of tensorflow on macos (in my case 2.15) behaves different then the one on Linux (2.16).

The current HDF5 format of the keras file is probably a legacy format. The new keras format is different ... a zip file.

I can load the current HDF5 format on macos with tensorflow 2.15. But I get an issue when I try it with version 2.16.

I tried to install version 2.15 on Linux ... but installation fails ... complaining about cuda.

I also tried to rename the file to .h5 but that also did not help.

So ... the issue is not the keras file itself :)

Not sure how to fix .... but I would guess the following:

  • use the new keras zip format
  • pin tensorflow in conda environment to use at least version 2.16 ?

There is a conda spec file (Linux only) to make the enviroment reproducible:
https://github.com/climateintelligence/shearwater/blob/main/spec-list.txt

You could try the installation on Linux and update the spec-list.txt.

@PaoloBonettiPolimi
Copy link
Collaborator Author

PaoloBonettiPolimi commented Jun 13, 2024

@cehbrecht I am double checking the issue from Levante, which is Linux based. When I install the process in the conda environment produced by the environment.yml file, I have Tensorflow 2.15, and the load works as it is currently implemented. Do you know why it becomes tensorflow 2.16 in the demo? Do you think we can try to force Tensorflow=2.15.0 in that file, and check if it works?

@cehbrecht
Copy link
Contributor

@PaoloBonettiPolimi well, we can try :) You could in addition generate a conda spec file ... on linux. Would be good to have the necessary dependencies in the conda environment ... in case something is missing. That way it should be reproducible.

@cehbrecht
Copy link
Contributor

@PaoloBonettiPolimi ... if you run tests with a docker image you could pick AlmaLinux 9.4. That is the one we use for deployment.

fixing tensorflow 2.15.0
@PaoloBonettiPolimi
Copy link
Collaborator Author

@cehbrecht uhm.. for the moment I'm just using the terminal or a notebook in Levante, I've fixed the tensorflow version and it works correctly there, which is Linux-based. Let me know how it goes when deployed, otherwise I'll try to save the model as an .h5 file directly from keras.

@cehbrecht
Copy link
Contributor

cehbrecht commented Jun 14, 2024

@PaoloBonettiPolimi it works now :) I have updated also the conda spec file.

Do you like to merge?

@PaoloBonettiPolimi PaoloBonettiPolimi merged commit 8443af2 into main Jun 14, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants