You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now, gdalinfo is called on output assets in the driver. In case of gtiff output on S3, the assets where written on an executor, and need to get downloaded again in the driver.
In case of fusemount it happens implicitly, in case of direct S3 access, it happens explicitly here:
Requires that scala code makes the gdalinfo call, but also that we have a way to pass the resulting metadata back to the driver.
This could perhaps be achieved by assembling the stac json files already in executors.
Now, gdalinfo is called on output assets in the driver. In case of gtiff output on S3, the assets where written on an executor, and need to get downloaded again in the driver.
In case of fusemount it happens implicitly, in case of direct S3 access, it happens explicitly here:
openeo-geopyspark-driver/openeogeotrellis/integrations/gdal.py
Lines 177 to 182 in 88ab283
Moving gdalinfo to the executor and passing the info on would avoid this extra download.
This might avoid OOM like this: #809
And would have avoided this log deadlock: #906
cc @jdries
The text was updated successfully, but these errors were encountered: