Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-2683 Upgrade to GDAL 3.8.3 and Miniconda3 #242

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

danscales
Copy link
Collaborator

GTC-2683 Upgrade to GDAL 3.8.3.

This allows upgrading to Miniconda3 as well (GTC-2774), which is much more recent that the old Miniconda that we've been using and seems to run quite a bit faster. GDAL 3.8.3 is also compatible with EMR-serverless, if we want to use it for certain jobs.

  • Created a new Dockerfile ci/Dockerfile for the docker that runs the github CI tests, since there is no quay.io image (what we were using previously) for GDAL 3.8.3. Change .github/workflows/ci.yaml to use this new docker image (which I built and uploaded separately).

  • Removed the top-level Dockerfile and entrypoint.sh, which are very old versions of what is needed to run the analyses in batch jobs. The current versions of these are in gfwpro-scheduler:src/docker.

  • Added some info in README.md about various files, including sbt, ci/Dockerfile, and scripts/gdal.sh.

  • Added new geotrellisGdalWarp dependency, needed for the upgraded GDAL.

  • Includes the new scripts/gdal.sh, which is the bootscript needed for EMR runs with GDAL 3.8.3. It uses Miniconda3 and avoids using the default anaconda repository. It seems to run in about 1 minute, whereas the old script took roughly 4 minutes.

  • Print out all environment variables when starting up Geotrellis, just as a way to debug various startup/configuration problem.

This allows upgrading to Miniconda3 as well (GTC-2774), which is much
more recent that the old Miniconda that we've been using and seems to
run quite a bit faster. GDAL 3.8.3 is also compatible with
EMR-serverless, if we want to use it for certain jobs.

 - Created a new Dockerfile ci/Dockerfile for the docker that runs the
   github CI tests, since there is no quay.io image (what we were using
   previously) for GDAL 3.8.3. Change .github/workflows/ci.yaml to use
   this new docker image (which I built and uploaded separately).

 - Removed the top-level Dockerfile and entrypoint.sh, which are very
   old versions of what is needed to run the analyses in batch jobs. The
   current versions of these are in gfwpro-scheduler:src/docker.

 - Added some info in README.md about various files, including sbt,
   ci/Dockerfile, and scripts/gdal.sh.

 - Added new geotrellisGdalWarp dependency, needed for the upgraded
   GDAL.

 - Includes the new scripts/gdal.sh, which is the bootscript needed for
   EMR runs with GDAL 3.8.3. It uses Miniconda3 and avoids using the
   default anaconda repository. It seems to run in about 1 minute,
   whereas the old script took roughly 4 minutes.

 - Print out all environment variables when starting up Geotrellis, just
   as a way to debug various startup/configuration problem.
@danscales danscales requested review from jterry64 and dmannarino July 15, 2024 16:48
Copy link
Member

@dmannarino dmannarino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@solomon-negusse
Copy link
Member

solomon-negusse commented Jul 18, 2024

@danscales: recommending again testing out libmamba solver that may give you more speed up.. i got an order of magnitude speed up on personal project (https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community).

Copy link
Member

@jterry64 jterry64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for putting all the effort in to get this out!

@danscales
Copy link
Collaborator Author

@danscales: recommending again testing out libmamba solver that may give you more speed up.. i got an order of magnitude speed up on personal project (https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community).

@solomon-negusse I think the much newer conda (downloaded by the much newer Miniconda3) may be using libmamba. The solver is much faster and only takes about 15 seconds. There is only one conda line now, since we load everything from the same conda-forge repo (don't use the default anaconda repo). The total time for the update is now only a minute, of which the 45 seconds remaining (i.e. which is not the solving) is downloading and unpacking the 2GB of packages.

@danscales danscales merged commit 47860b2 into master Jul 29, 2024
2 of 3 checks passed
@danscales danscales deleted the upgrade-to-gdal3.8.3 branch July 29, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants