Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA256 does not match when using multiple extra index urls containing same package #2631

Open
MajerMartin opened this issue Dec 17, 2024 · 10 comments
Assignees

Comments

@MajerMartin
Copy link

Hi, we are using Pants build system with two private repositories - first one is for release versions, second one is for snapshots. Each of these repositories can contain package with same name and same version.

Pants setup:

[python-repos]
indexes.add = [
    "https://myindex/nexus/repository/pypi-hosted/simple/",
    "https://myindex/nexus/repository/pypi-hosted-snapshots/simple/",
]

Example package in our repos:

https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl
https://myindex/nexus/repository/pypi-hosted-snapshots/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl

When I try to create lock file using pants generate-lockfiles, Pants generates PEX create lock command which includes:

--no-pypi --index=https://pypi.org/simple/ --index=https://myindex/nexus/repository/pypi-hosted/simple/ --index=https://myindex/nexus/repository/pypi-hosted-snapshots/simple/

and which is then translated into these for Pip:

--index-url https://pypi.org/simple/ --extra-index-url https://myindex/nexus/repository/pypi-hosted/simple/ --extra-index-url https://myindex/nexus/repository/pypi-hosted-snapshots/simple/

This command crashes with

Expected sha256 hash of 3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3 when downloading mypackage but hashed to 4ca541b60a956f99ccc5fda61a58df774cc801a3260c2dea0d1245313b0b18f5.

The problem is that PEX takes hash of package from the first index, where the (resolvable) package is found, but Pip downloads from the last index, where the package is found.

PEX log:

pex: Indexing downloads :: Downloading FileArtifact(url=ArtifactURL(raw_url='https://myindexl/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl#sha256=3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3', download_url='https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl', normalized_url='https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl', scheme='https', path='/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl', fragment_parameters={}, fingerprints=(Fingerprint(algorithm='sha256', hash='3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3'),)), fingerprint=Fingerprint(algorithm='sha256', hash='3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3'), verified=False, filename='mypackage-1.2.3-py3-none-any.whl')
pex: Indexing downloads: 17.1ms
pex:   Downloading FileArtifact(url=ArtifactURL(raw_url='https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl#sha256=3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3', download_url='https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl', normalized_url='https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl', scheme='https', path='/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl', fragment_parameters={}, fingerprints=(Fingerprint(algorithm='sha256', hash='3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3'),)), fingerprint=Fingerprint(algorithm='sha256', hash='3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3'), verified=False, filename='mypackage-1.2.3-py3-none-any.whl'): 0.4ms

Pip log:

2024-12-17T13:52:37,423 Looking in indexes: https://pypi.org/simple/, https://myindex/nexus/repository/pypi-hosted/simple/, https://myindex/nexus/repository/pypi-hosted-snapshots/simple/
...
2024-12-17T13:53:27,376 3 location(s) to search for versions of mypackage:
2024-12-17T13:53:27,376 * https://pypi.org/simple/mypackage/
2024-12-17T13:53:27,376 * https://myindex/nexus/repository/pypi-hosted/simple/mypackage/
2024-12-17T13:53:27,376 * https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/
2024-12-17T13:53:27,376 Fetching project page and analyzing links: https://pypi.org/simple/mypackage/
2024-12-17T13:53:27,376 Getting page https://pypi.org/simple/mypackage/
2024-12-17T13:53:27,376 Found index url https://pypi.org/simple/
2024-12-17T13:53:27,377 Looking up "https://pypi.org/simple/mypackage/" in the cache
2024-12-17T13:53:27,377 Request header has "max_age" as 0, cache bypassed
2024-12-17T13:53:27,377 No cache entry available
2024-12-17T13:53:27,487 https://pypi.org:443 "GET /simple/mypackage/ HTTP/1.1" 404 13
2024-12-17T13:53:27,492 Status code 404 not in (200, 203, 300, 301, 308)
2024-12-17T13:53:27,494 Could not fetch URL https://pypi.org/simple/mypackage/: 404 Client Error: Not Found for url: https://pypi.org/simple/mypackage/ - skipping
2024-12-17T13:53:27,494 Fetching project page and analyzing links: https://myindex/nexus/repository/pypi-hosted/simple/mypackage/
2024-12-17T13:53:27,495 Getting page https://myindex/nexus/repository/pypi-hosted/simple/mypackage/
2024-12-17T13:53:27,496 Found index url https://myindex/nexus/repository/pypi-hosted/simple/
2024-12-17T13:53:27,497 Looking up "https://myindex/nexus/repository/pypi-hosted/simple/mypackage/" in the cache
2024-12-17T13:53:27,497 Request header has "max_age" as 0, cache bypassed
2024-12-17T13:53:27,498 No cache entry available
2024-12-17T13:53:27,861 https://myindex:443 "GET /nexus/repository/pypi-hosted/simple/mypackage/ HTTP/1.1" 200 49361
2024-12-17T13:53:28,046 Updating cache with response from "https://myindex/nexus/repository/pypi-hosted/simple/mypackage/"
2024-12-17T13:53:28,048 Fetched page https://myindex/nexus/repository/pypi-hosted/simple/mypackage/ as text/html
...
2024-12-17T13:53:28,091   Found link https://myindex/nexus/repository/pypi-hosted/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl#sha256=3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3 (from https://myindex/nexus/repository/pypi-hosted/simple/sbsmypackagepark/) (requires-python:>=3.8,<3.12), version: 1.2.3
...
024-12-17T13:53:28,129 Fetching project page and analyzing links: https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/
2024-12-17T13:53:28,129 Getting page https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/
2024-12-17T13:53:28,129 Found index url https://myindex/nexus/repository/pypi-hosted-snapshots/simple/
2024-12-17T13:53:28,130 Looking up "https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/" in the cache
2024-12-17T13:53:28,130 Request header has "max_age" as 0, cache bypassed
2024-12-17T13:53:28,130 No cache entry available
2024-12-17T13:53:28,317 https://myindex:443 "GET /nexus/repository/pypi-hosted-snapshots/simple/mypackage/ HTTP/1.1" 200 23324
2024-12-17T13:53:28,321 Updating cache with response from "https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/"
2024-12-17T13:53:28,323 Fetched page https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/ as text/html
...
2024-12-17T13:53:28,342   Found link https://myindex/nexus/repository/pypi-hosted-snapshots/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl#sha256=4ca541b60a956f99ccc5fda61a58df774cc801a3260c2dea0d1245313b0b18f5 (from https://myindex/nexus/repository/pypi-hosted-snapshots/simple/mypackage/) (requires-python:>=3.8,<3.12), version: 1.2.3
...
2024-12-17T13:53:28,384 Collecting mypackage
2024-12-17T13:53:28,384   Created temporary directory: /Users/myuser/.cache/pants/named_caches/pex_root/pip/1/24.2/pip_cache/.tmp/pip-unpack-yg18r5en
2024-12-17T13:53:28,384   Found index url https://myindex/nexus/repository/pypi-hosted-snapshots/simple/
2024-12-17T13:53:28,385   Looking up "https://myindex/nexus/repository/pypi-hosted-snapshots/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl" in the cache
2024-12-17T13:53:28,385   No cache entry available
2024-12-17T13:53:28,385   No cache entry available
2024-12-17T13:53:28,570   https://myindex:443 "GET /nexus/repository/pypi-hosted-snapshots/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl HTTP/1.1" 200 151289
2024-12-17T13:53:28,571   Downloading https://myindex/nexus/repository/pypi-hosted-snapshots/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl (151 kB)
2024-12-17T13:53:28,764   Updating cache with response from "https://myindex/nexus/repository/pypi-hosted-snapshots/packages/mypackage/1.2.3/mypackage-1.2.3-py3-none-any.whl"
2024-12-17T13:53:28,764   etag object cached for 1209600 seconds
2024-12-17T13:53:28,764   Caching due to etag

When I reverse the indexes in Pants, I get reversed hash results:

[python-repos]
indexes.add = [
    "https://myindex/nexus/repository/pypi-hosted-snapshots/simple/",
    "https://myindex/nexus/repository/pypi-hosted/simple/",
]
Expected sha256 hash of 4ca541b60a956f99ccc5fda61a58df774cc801a3260c2dea0d1245313b0b18f5 when downloading mypackage but hashed to 3f009f055f42aba6b251e5bde1c1fae49d8b4805478002aeb417fa257a6babf3.

I know this is well known Pip limitation (all indexes have same priority), but is there something that can be done on Pants/PEX side? Use hash of artifact from last index, where the package is found, to match Pip's selection process?

@jsirois
Copy link
Member

jsirois commented Dec 17, 2024

Is mypackage figurative or real? If it truly is your own package, are the multiple representatives on different indexes supposed to be identical (and just the wheel build process is not reproducible) or are they supposed to be different (different snapshots with same declared version)? Fundamentally, having the same nominal package on different indexes with different contents is a very bad situation to be in that should be ideally solved 1st.

I'll dig deeper on the details you provide on 12/19.

@MajerMartin
Copy link
Author

MajerMartin commented Dec 17, 2024

mypackage represents real package. We have multiple internal packages and this issue applies to all of them.

Individual representatives may be eventually identical but does not necessarily have to be. Snapshot index is for development - we may publish snapshot which is mutable and use this snapshot in another projects. When the feature is finalized, we publish release to different index which is immutable. Usually the last snapshot has identical content to release version. But sometimes one may add some last changes and release the package. Snapshot and release would then diverge.

To make it clear, we use only https://myindex/nexus/repository/pypi-hosted/simple/ and pypi in production. The case below is development setup where we need to include some internal WIP library.

[python-repos]
indexes.add = [
    # for example
    # contains release of A==1.0.0 and B==0.0.1
    "https://myindex/nexus/repository/pypi-hosted/simple/",
    # contains old snapshot of A==1.0.0 and WIP snapshot B=0.0.2
    "https://myindex/nexus/repository/pypi-hosted-snapshots/simple/",
]

I want A=1.0.0 from release repo and B=0.0.2 from snapshot repo. However I get both from snapshot repo because A was found in both repos - PEX hashed it from release repo, Pip downloaded from snapshot repo (both purely based on order).

So my idea was to change the order to make sure A==1.0.0 is taken from release. But it would require PEX to hash it from the last repo.

[python-repos]
indexes.add = [
    # for example
    # contains old snapshot of A==1.0.0 and WIP snapshot B=0.0.2
    "https://myindex/nexus/repository/pypi-hosted-snapshots/simple/",
    # contains release of A==1.0.0 and B==0.0.1
    "https://myindex/nexus/repository/pypi-hosted/simple/",
]

Thank you a lot for looking into this.

@jsirois
Copy link
Member

jsirois commented Dec 17, 2024

Snapshot index is for development - we may publish snapshot which is mutable and use this snapshot in another projects.

I want to focus here. This is not OK. Each "snapshot" should have its own version; that's what a local version identifier suffix tends to be used for: https://peps.python.org/pep-0440/#local-version-identifiers

Re-using the same version number for mutating content is the root issue here. This is Maven-style -SNAPSHOT shenanigans applied to the Python ecosystem.

Are there practical issues preventing use of a local version identifier?

As I said, I will look at the secondary index / lock / download hash alignment issues on the 19th, but I'd like to have full understanding of this primary issue 1st.

@MajerMartin
Copy link
Author

Yes, you are right. We actually use Python and Scala, so I can confirm this approach is definitely Maven-style. I shared PEP 440 with the team after researching what is the "pythonic" approach - few minutes before your response which confirmed it 😄

Are there practical issues preventing use of a local version identifier?

Not really. This approach was set up few years ago and adopted by most of our Python teams (including ours). Not a problem for Poetry teams, problem for Pants team. We will try to migrate.

@jsirois
Copy link
Member

jsirois commented Dec 17, 2024

Ok, thanks for the confirmation. I still would like to leave this issue open to investigate the indexes / hashes mismatch. There may be something better Pex could be doing.

@MajerMartin
Copy link
Author

If there was some improvement that could help us in the meantime (before we migrate), it would be awesome. But as we are doing something non-standard, I am prepared for "can't be resolved" outcome as well 😄

Thank you once again and especially for the quick response time. I really appreciate it.

@jsirois jsirois self-assigned this Dec 19, 2024
@jsirois
Copy link
Member

jsirois commented Dec 19, 2024

@MajerMartin one missing piece of information here is what --pip-version Pants is passing Pex, if any (as well as the Pex version being used). Pip has changed how it handles index ordering over the course of the Pip releases Pex supports (20.3.4 and 22.2.2 through 24.3.1). IIUC it has never guaranteed ordering in any way whatsoever; thus they have felt the freedom to change how they pick from amongst indexes over this course of time since they deem it an internal detail. Regardless, knowing which --pip-version you're selecting will help in creating an integration test that is faithful to the OP issue.

Actually, to be complete, I'll need:

  1. The Pex version being used (if you don't know, then just let me know the Pants version).
  2. The full command line that Pants uses to invoke Pex. This should be available by running pants with -ldebug.
  3. Do any of your internal indexes implement https://peps.python.org/pep-0691/? ... or just provide the total pip log and confirm you gathered this by passing --pip-log ... to Pex or else describe how you got that complete log.

@jsirois
Copy link
Member

jsirois commented Dec 19, 2024

One other note - I referenced using local version identifiers when creating snapshots, but that's just one of several schemes you could use. Since these are pre-releases, you could also use 4.0.0.aN with N strictly increasing, etc. There are many approaches. The only thing to avoid being ever publishing anything ever in any language ecosystem with the same version as something already published but with different contents. This was always a bad idea. I worked in the java ecosystem for the 1st half of my career and -SNAPSHOT was always broken and causing issues for this reason. Just not sane at all without alot of special tooling to special case -SNAPSHOT as never cacheable / sortable implicitly by date published ... just a mess.

@MajerMartin
Copy link
Author

I tried with Pants 2.24.0 (Pex 2.20.3 and Pip 24.2) and 2.25.0.dev1 (Pex 2.27.1 and Pip 24.2).

Attached Pip log was gathered using --pip_log (I added this and -vvv while debugging).
pip.log

Output after -ldebug:

07:39:02.94 [DEBUG] spawned local process as Some(10898) for Process { argv: ["/Users/martinmajer/.cache/pants/pants_dev_deps/fe2d7dc6846a5cbc52083d73263bec9cf29e69e3.venv/bin/python", "./pex", "lock", "create", "--tmpdir", ".tmp", "--no-emit-warnings", "--cert", "ca-certificates.crt", "--python-path", "/Users/martinmajer/.pyenv/versions/3.10.14/bin:/Users/martinmajer/.pyenv/versions/3.11.9/bin:/Users/martinmajer/.pyenv/versions/3.12.4/bin:/Users/martinmajer/.pyenv/versions/3.8.19/bin:/Users/martinmajer/.pyenv/versions/3.9.19/bin:/Users/martinmajer/.cache/pants/pants_dev_deps/fe2d7dc6846a5cbc52083d73263bec9cf29e69e3.venv/bin:/Users/martinmajer/Projects/mleng/milky-way/dist/export/python/virtualenvs/asset_indexing/3.9.6/bin:/Users/martinmajer/micromamba/condabin:/Users/martinmajer/.sdkman/candidates/java/current/bin:/Users/martinmajer/.pyenv/shims:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/podman/bin:/Users/martinmajer/.cargo/bin:/Users/martinmajer/.local/bin", "-vvv", "--pip-log=/Users/martinmajer/Projects/mleng/milky-way/pip.log", "--output=lock.json", "--style=universal", "--pip-version", "24.2", "--resolver-version", "pip-2020-resolver", "--target-system", "linux", "--target-system", "mac", "--indent=2", "--python-path=/Users/martinmajer/.pyenv/versions/3.10.14/bin/python3.10", "--no-pypi", "--index=https://pypi.org/simple/", "--index=https://nexus.ccl/nexus/repository/pypi-hosted/simple/", "--index=https://nexus.ccl/nexus/repository/pypi-hosted-snapshots/simple/", "--manylinux", "manylinux2014", "--interpreter-constraint", "CPython==3.10.*", "--interpreter-constraint", "CPython==3.11.*", "boto3", "chispa", "enrichment-flow", "mlflow-library~=5.0", "mlflow-skinny", "monster-config==1.0.0", "moto[server]", "mypy-boto3-s3", "numpy", "pandas", "pip", "pyspark", "pytest", "sbspark", "setuptools"], env: {"CPPFLAGS": "", "LC_CTYPE": "UTF-8", "LDFLAGS": "", "PATH": "/Users/martinmajer/.cache/pants/pants_dev_deps/fe2d7dc6846a5cbc52083d73263bec9cf29e69e3.venv/bin:/Users/martinmajer/Projects/mleng/milky-way/dist/export/python/virtualenvs/asset_indexing/3.9.6/bin:/Users/martinmajer/micromamba/condabin:/Users/martinmajer/.sdkman/candidates/java/current/bin:/Users/martinmajer/.pyenv/shims:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/podman/bin:/Users/martinmajer/.cargo/bin:/Users/martinmajer/.local/bin", "PEX_IGNORE_RCFILES": "true", "PEX_PYTHON": "/Users/martinmajer/.cache/pants/pants_dev_deps/fe2d7dc6846a5cbc52083d73263bec9cf29e69e3.venv/bin/python", "PEX_ROOT": ".cache/pex_root", "PEX_SCRIPT": "pex3"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<1ecb470504cbd9b12670d6f077a5788365fb2f68dcaa295c77706748026726fe>, size_bytes: 253 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, inputs: DirectoryDigest { digest: Digest { hash: Fingerprint<1ecb470504cbd9b12670d6f077a5788365fb2f68dcaa295c77706748026726fe>, size_bytes: 253 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {RelativePath("lock.json")}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Generate lockfile for milky_way", level: Info, append_only_caches: {CacheName("pex_root"): RelativePath(".cache/pex_root"), CacheName("python_build_standalone"): RelativePath(".python-build-standalone")}, jdk_home: None, cache_scope: PerSession, execution_environment: ProcessExecutionEnvironment { name: None, platform: Macos_arm64, strategy: Local }, remote_cache_speculation_delay: 0ns, attempt: 0 }

@MajerMartin
Copy link
Author

MajerMartin commented Dec 19, 2024

Hm, it seems the fastest workaround for us could be to just re-publish snapshot from "tag" CI so the snapshot is identical to release and gets same SHA256. And it then wouldn't matter which one is picked. Sorry for raising this issue as a result of our non-standard approach. There are workarounds for us, so if there is nothing you can/want to improve, feel free to close this anytime 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants