Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for environment.yml files #457

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jherland
Copy link
Member

@jherland jherland commented Sep 16, 2024

This adds support for Conda projects that use environment.yml for declaring dependencies.

NOTE: The same Conda caveat as for Pixi still applies: We do not currently differentiate between Conda dependencies and PyPI dependencies, meaning that we assume that a Conda dependency named FOO will map one-to-one to a Python package named FOO. This is certainly not true for Conda dependencies that are not Python packages, and it probably isn't even true for all Conda dependencies that do indeed include Python packages.

Commits:

  • Restructure code for extracting declared dependencies
  • Add PyYAML dependency
  • Initial support for parsing environment.yml files
  • Improve error handling when deps file is passed to mismatched parser
  • Add environment.yml support to the rest of FawltyDeps
  • sample_projects: Add example Conda project
  • test_traverse_project: Add tests using the Conda sample project

@jherland jherland self-assigned this Sep 16, 2024
@jherland jherland force-pushed the jherland/environment-yml-support branch 2 times, most recently from c6cca06 to 5e5455e Compare September 18, 2024 12:49
@jherland jherland force-pushed the jherland/pixi-toml-support branch from 2de6d56 to 88c9ba0 Compare September 19, 2024 08:32
@jherland jherland force-pushed the jherland/environment-yml-support branch 3 times, most recently from c5325d9 to 7c2b9a2 Compare September 19, 2024 09:12
@jherland jherland force-pushed the jherland/pixi-toml-support branch from a72e423 to 02c28ba Compare September 19, 2024 10:16
@jherland jherland force-pushed the jherland/environment-yml-support branch 2 times, most recently from 9f73f59 to c70fc36 Compare September 19, 2024 11:11
@jherland jherland force-pushed the jherland/pixi-toml-support branch from 4ede02f to de059c7 Compare September 19, 2024 11:28
@jherland jherland force-pushed the jherland/environment-yml-support branch from c70fc36 to f5460cb Compare September 19, 2024 11:28
@jherland jherland force-pushed the jherland/pixi-toml-support branch from de059c7 to 838231f Compare September 19, 2024 11:55
@jherland jherland force-pushed the jherland/environment-yml-support branch from f5460cb to 63dd427 Compare September 19, 2024 11:55
@jherland jherland force-pushed the jherland/pixi-toml-support branch from 838231f to fc39cf4 Compare September 19, 2024 12:59
@jherland jherland force-pushed the jherland/environment-yml-support branch from 63dd427 to 2ed9c81 Compare September 19, 2024 12:59
@jherland jherland force-pushed the jherland/pixi-toml-support branch from fc39cf4 to 4da7004 Compare September 19, 2024 13:13
@jherland jherland force-pushed the jherland/environment-yml-support branch from 2ed9c81 to aa5ca07 Compare September 19, 2024 13:14
@jherland jherland force-pushed the jherland/pixi-toml-support branch from 4da7004 to 09e43d8 Compare September 19, 2024 13:57
@jherland jherland force-pushed the jherland/environment-yml-support branch from aa5ca07 to d89f6bd Compare September 19, 2024 13:57
@jherland jherland force-pushed the jherland/pixi-toml-support branch from 09e43d8 to d76d45e Compare September 19, 2024 15:10
@jherland jherland force-pushed the jherland/environment-yml-support branch from d89f6bd to ea9e35d Compare September 19, 2024 15:10
@jherland jherland force-pushed the jherland/pixi-toml-support branch from d76d45e to 8198e99 Compare September 20, 2024 08:32
@jherland jherland force-pushed the jherland/environment-yml-support branch from ea9e35d to 466baa9 Compare September 20, 2024 08:33
@jherland jherland force-pushed the jherland/pixi-toml-support branch from 8198e99 to f773397 Compare October 7, 2024 07:26
@jherland jherland force-pushed the jherland/environment-yml-support branch from 466baa9 to 8dc8c89 Compare October 7, 2024 07:26
@jherland jherland marked this pull request as ready for review October 7, 2024 07:38
@jherland jherland requested a review from zz1874 October 7, 2024 07:39
Base automatically changed from jherland/pixi-toml-support to main October 8, 2024 07:50
@jherland jherland force-pushed the jherland/environment-yml-support branch from 8dc8c89 to 3265f9c Compare October 8, 2024 08:02
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks lighter that way!
What do you think about the following code organizing pattern:

└── extract_deps
    ├── __init__.py
    ├── pixi_parser.py
    ├── pyproject_parser.py
    ├── requirements_parser.py
    └── setup_parser.py

The content of the current extract_deps.py would be in __init__.py.
We would improve the logical structure of the package with purpose segregation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this idea. And finally implemented in the latest rebase. 😄

fawltydeps/extract_declared_dependencies.py has been growing longer, and
is about to grow even longer with the introduction of parsing
environment.yml files. Now is the time to split this up into a
subpackage containing multiple modules that each focus on one particular
(group of) dependency declaration formats.

Also, take the opportunity to rename extract_declared_depenedencies into
extract_deps: We use the term "deps" everywhere (including in our
project name), so there is no need to spell out "dependencies" in all of
these filenames.

No functional changes in this commit. It's all just moving code between
files, and fixing up associated references accordingly.
This is needed to start parsing environment.yml files from the Conda
ecosystem. v6.0.1 is the last version to support Python v3.7, so this
is our minmum version for now.

Adding a yaml dependency to FawltyDeps makes it available inside the
Python environment that is used by the packages.SysPathPackageResolver.
This changes the behavior of a couple of real_projects tests, where
we previously expected yaml/PyYAML to be reported as undeclared/unused
when no Python environment with PyYAML installed was present. Now, we
can assume that PyYAML -> yaml is always correctly mapped via sys.path.

Also, a potential future issue to be solved is made apparent by our need
to declare types-PyYAML in ignore_unused: I suspect the "types-$PACKAGE"
form is quite common, and should be handled in the same way as "-stubs"
suffixes inside Package.has_type_stubs().
Include support for parsing Pip dependencies embedded within
environment.yml files, like this:

    ```yaml
    name: example
    dependencies:
      - jupyterlab=1.0
      - pip=19.1
      - pip:
        - kaggle==1.5
        - yellowbrick==0.9
    ```

Also add test cases, mirroring the structure we already use to test our
pyproject.toml and pixi.toml parsers.
In tests/test_deps_parser_determination.py,
test_explicit_parse_strategy__mismatch_yields_appropriate_logging()
tests what happens when the --deps-parser-choice option is (ab)used to
parse a deps file - e.g. setup.cfg - with the "wrong" parser - e.g.
parse_setup_py().

The test verifies that we print a warning message when such mismatch
occurs. However, the test also _assumes_ that when no mismatch is
present, the empty file being passed to the appropriate parser would
yield zero warning messages.

Although this happens to be true for our existing parsers, we are about
to add a new parser for environment.yml files, where an empty file is
_not_ silently accepted, but rather produces this error message:

  Failed to parse path/to/environment.yml: No top-level mapping found!

I believe the implicit assumption in this test is wrong: an empty file
could very well produce warnings/errors, and we should instead make sure
that the files being passed in the test are actually formatted according
to their filenames (e.g. a setup.cfg file should be formatted as a valid
setup.cfg file).

Fortunately, we already have the fake_project() fixture which is able to
produce valid deps file in all our supported formats, so this is fairly
straightforward to fix.

After fixing this, however, I discover a new issue: When passing e.g. a
valid setup.cfg file to the parse_setup_py() function, it (obviously,
in hindsight) fails with a SyntaxError: parse_setup_py() expects to find
valid Python code, and the setup.cfg format is certainly not that. (The
same goes for other file formats passed to parse_setup_py().) This
SyntaxError is NOT caught by the parser, and causes FawltyDeps to abort.

The fix is straightforward - catch the exception and print an error
mesage - and the end result is that we now subject our parsers to a bit
more scrutiny:

- When there is a mismatch between the chosen parser and the given file,
  we verify that a warning about this mismatch is printed (along with
  any other warnings/errors that result from applying the "wrong" parser
  to this file).
- When there is no mismatch between the chosen parser and the given
  file, we verify that a correctly formatted file (courtesy of
  fake_project()) generates no warnings/errors at all.
Now that we can parse dependency declarations from environment.yml
files, we need to expose this functionality in our docs + CLI, as
well as automatically find environment.yml files while traversing
projects.
This was created with the following commands:

  - conda create --name conda_example python=3.12
  - conda activate conda_example
  - conda install requests
  - conda env export --from-history > environment.yml
  - echo "import requests" > main.py

followed by copying the corresponding Conda environment from
~/.conda/envs/conda_example into this sample project directory,
and then removing/truncating files that I do not currently consider
to be important for FawltyDeps to make sense of this project.

The result is a simple Conda project with two declared Conda
dependencies (Python itself + requests), along with a single source
file that imports requests.

The expected result of running FawltyDeps here is to report no
undeclared or unused dependencies.

Please note that since Conda puts environments under ~/.conda, and NOT
inside the project directory, this example is somewhat contrived: We can
not expect the Conda environment to be automatically discovered by
FawltyDeps (unless FD itself is installed and running from there),
but we must instead rely on the use of --pyenv to point to
~/.conda/envs/conda_example (in the general case, or
tests/sample_project/conda_example/.conda/envs/conda_example in this
test case)
@jherland jherland force-pushed the jherland/environment-yml-support branch from 3265f9c to 31ec342 Compare October 22, 2024 21:31
Copy link
Collaborator

@mknorps mknorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jherland , this is a great improvement on both, the code readability side and on the features side.
environment.yml was a declaration file asked by the users to be supported ❤️

✔️ I checked the changes, looked if there are no places left with the old code structure references.

🛑 Then I ran the newest version on a randomly chosen project that uses environment.yaml files to declare dependencies. Parsed names are trailed with > ehn there was some version specification of a package. To recreate, run in Cartopy project:

fawltydeps environment.yml --list-deps

The output I got is:

cython>
matplotlib-base>
numpy>
owslib>
packaging>
pillow>
pre-commit
pydata-sphinx-theme
pykdtree
pyproj>
pyshp>
pytest
pytest-mpl
pytest-xdist
ruff
scipy>
setuptools_scm
shapely>
sphinx
sphinx-gallery

Could you check if this is something we will not support for now or a parsing bug?

The code for extracting the package name from a Conda dependency
specifier was overly simplistic: it assumed that any version/build
info in the string always started with a "=" character. This is
obviously incorrect (e.g. "numpy>=1.23" is a simple counter-example).

Fix the parser to look for the same characters that the upstream looks
for, when separating package name from version + build info (see
https://github.com/conda/conda/blob/aa0fb6f3ae669a5ade575d340555aa6a9f71ef5e/conda/models/match_spec.py#L716
for details).

Add a test case copied from the cartopy project to illustrate the
correct use of such specifiers, and also add a couple of tests to verify
that we properly deal with invalid package names in dependency
specifiers (both Conda and Pip) inside an environment.yml file.
@jherland jherland force-pushed the jherland/environment-yml-support branch from af03b40 to 4e75de3 Compare December 3, 2024 18:06
@jherland
Copy link
Member Author

jherland commented Dec 3, 2024

🛑 Then I ran the newest version on a randomly chosen project that uses environment.yaml files to declare dependencies. Parsed names are trailed with > when there was some version specification of a package.

Thanks for finding this!

I had overlooked the proper parsing of the package name from the version + build info in a Conda dependency specifier, and I obviously also had no test case to catch this.

Commit 4e75de3 should fix it: I added the environment.yaml file from cartopy as a new test case, and I fixed the parser with a relatively straightforward regex, stolen from Conda's own dependency parsing code.

With this added, I also needed to properly handle the case when a package name is missing or malformed. I added a couple of test cases for this, and made sure that we raise and handle the appropriate exceptions, so that we can print an error message about the faulty dependency and then keep on parsing the rest of the dependencies.

@jherland jherland requested a review from mknorps December 4, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[conda] Support dependency declarations from Conda's environment.yml files?
2 participants