Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] [STORY] Makefile #564

Open
8 tasks
adebardo opened this issue Aug 22, 2024 · 2 comments
Open
8 tasks

[POC] [STORY] Makefile #564

adebardo opened this issue Aug 22, 2024 · 2 comments
Labels
[POC] Conception To review Tickets needs approval about it conception

Comments

@adebardo
Copy link

Context

As part of their work, the CNES + sub-contractors developers work from a basic python env. For ease of implementation some tools use a Makefile. We therefore suggest copying this method to xdem.

This requires:

  • that the setup.cfg and setup.py files are organized as in demcompare
  • reflect on the use of requirements.txt
  • some opencv dependencies may be missing
  • Copy the demcompare makefile: https://gitlab.cnes.fr/3d/demcompare/-/blob/master/Makefile
  • `PYTHON_VERSION_MIN fixed at 3.9
  • replace "demcompare" with "xdem"
  • remove dependencies related to notebooks, coverage, remove pylint, docs
  • add xdem's pre-commits dependencies

3d estimation

@adebardo adebardo added the [POC] Conception To review Tickets needs approval about it conception label Aug 22, 2024
@duboise-cnes
Copy link
Member

Here are some remarks:

  • Makefile depends on several sub topics and is just a helpful wrapper for dev:
    • python, pip, packaging if it is ok to use python pip install as the basic for developers (and not conda/mamba).
    • linting [POC] linting with ruff #550
    • test pytest (coverage is linked also and why remove it ? can be useful as information)
    • documentation generation (i would not remove it from the list, only use the current workflow to generate docs)
    • ok for notebook removal for now
    • pre-commit to be decided (ruff on commit ? have to choose strategy)
    • clean and linked .gitignore is very important also: make clean must remove perfectly all dev parts to be able to start over (very useful when in a not stable place, especially for support for other dev and contribution)

Maybe add or link to adequate issues for subtopics ?
A first Makefile can be done without all sub topics and each sub topic can be added little by little to have a full dev framework

I think before Makefile, the pip install and packaging must be converged, especially the mix with mamba/conda. I find the link between requirements.txt and environment.yaml not clear and have questions for maintenance. Ok for an another issue (or have not found one if it exists) on that ?
For a clean and a "pure" python project, it is also possible to simplify setup.py and setup.cfg and have only a pyproject.toml now.
To be discussed and be sure of the dev ways to go.

I would separate the pure python packaging with its dependencies (and not fixed one) from the dependencies from conda/mamba (and fixed one if needed to keep users stable for this installation).
Fixed dependencies can be really problematic in a mix environment with other programs with same dependencies.
My 2cts, this last part would be more in a "packaging" issue... but for now, i put it here:

requirements.txt (generated from environment.yaml, can we change that ?)

geopandas>=0.12.0  -> ok
numba==0.* -> why  == ? 
numpy==1.* -> can be complex with numpy 2 now
matplotlib==3.* -> idem
pyproj>=3.4,<4 -> less than 4 can be problematic in a complex mixed environment
rasterio>=1.3,<2 > idem for rasterio
scipy>=1.0,<1.13 > idem
tqdm
scikit-image==0.* > idem
scikit-gstat>=1.0,<1.1 > idem
geoutils==0.1.8 > idem
pip -> why pip in dependency, not useful i think

@rhugonnet
Copy link
Member

rhugonnet commented Aug 30, 2024

Some additional points:

Regarding your interrogations on the requirements.txt and the content of the environment file, here's the logic behind it:

  • For the generation of requirements.txt: It is inspired by the way Pandas deals with Pip+Conda: https://github.com/pandas-dev/pandas/blob/main/scripts/generate_pip_deps_from_conda.py. It allows to keep a consistent environment for both package managers based on a single file (environment.yml). We additionally have checks that verify that dev-environment.yml matches environment.yml. See Make environment files management consistent between pip and conda #429 for details. We did a lot of digging to ensure things were up-to-date (especially on the setuptools side), but are not experts in this. Happy to modify this if you know of other ways to keep the environment management consistent between Pip and Conda! 🙂
  • For the pinning of major versions: We pin the least amount possible (usually there's a reason due to incompatibility or temporary bug fix going on in those packages, such as Rasterio>1.3 is incompatible; or Scipy<1.13 because of a breaking bug fix in SciKit-GStat not yet released: spatialstats.sample_empirical_variogram issue with scipy >= 1.13 #509). We learned recently that it is a good practice to always pin a max only for the next major version of main dependencies for backwards-compatibility. The reason behind this is that major versions (so just to be clear only X.0.0, neither 0.X.0 or 0.0.X) are synonym with breaking changes for nearly all packages. Pinning them is not such a big constraint on the environment because major releases are not frequent (for instance, NumPy/Pandas happen every 5-10 years, other packages every couple years), but they are crucial for long-term working of older package versions. For example, the recent NumPy 2.0 release (and Pandas 2.0 a couple years ago) broke a large part of packages in the Python ecosystem exactly because of this (no pinning of the upper version in most packages). The first thought for most packages is that "we can easily either update the code or pin the max version, and push a new release which supports it". That works for the latest release, but not for the older versions that also all depended on NumPy 1.X and break with NumPy 2.0. Their environment was not associated to any upper constraint, and that might make those older versions break forever, which would be a big problem for reproducibility (let's say a study used v1.5 of our package but NumPy 2.0 came out and we updated for it only in 1.6, then all versions before 1.5 will break for users if we did not pin NumPy).

We made this choice based in great part on this nice article explaining the pinning of max versions here: https://iscinumpy.dev/post/bound-version-constraints/, and another blog post explaining this in relation to NumPy 2.0 (and how to upgrade code with Ruff): https://pythonspeed.com/articles/numpy-2/. The former especially insist about the negative effects of pinning max versions in general, but does specify an exception for major releases, or in general releases likely to trigger breaking changes: "Only add a cap if a dependency is known to be incompatible or there is a high (>75%) chance of it being incompatible in its next release".

We're not experts in this either, so curious of your experience, and happy to adapt things!
The main point of worry for us here is to not only have the latest release working, but keep older versions working as long as possible (for older operational workflows, reproducing published studies, etc).

And a final note to answer another interrogation: We use the syntax ==X.* for several dependencies as it's equivalent to >=X.0,<X+1, which pins the next major version + has the minimum major version (required for compatibility). So this syntax is used in the case where the min version and max versions are just 1 away.

@adebardo adebardo changed the title [POC] Create a makefile [POC] [STORY] Makefile Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[POC] Conception To review Tickets needs approval about it conception
Projects
None yet
Development

No branches or pull requests

3 participants