Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Poetry for local development. Add local development guide. #193

Merged
merged 31 commits into from
Sep 29, 2023

Conversation

zackproser
Copy link
Contributor

@zackproser zackproser commented Aug 15, 2023

Problem

Closes #192

These changes add support for managing the Pinecone Python client via Poetry, a virtualenv and dependency manager.

The intent is to provide a more consistent development experience for maintainers and contributors and to support the common workflow of needing to develop against the pinecone client as a dependency of a Jupyter notebook, or Python script or program that also imports other common libraries such as LangChain.

These changes also add a new CONTRIBUTING guide (link to rendered version for easier review) which explains how to develop the Pinecone Python client locally even across multiple shells using Poetry.

The intent of this guide is to significantly ease local development of Pinecone in scenario like Jupyter Notebooks and Python applications where pinecone and supporting libraries such as langchain are often imported after being pip installed.

Solution

  • Defined REST + gRPC dependencies under the [tool.poetry.dependencies] section
  • Defined test dependencies under [tool.poetry.dev-dependencies]
  • Included a WIP GitHub action to help sanity check that the poetry commands work properly / run without error as part of the CI/CD setup for the repository
  • Add docs
  • Additional local testing to ensure desired workflow of running the pinecone client as an immediately editable import with all changes tracked in git is working as expected
  • figure out the official author. I plunked my name down for now to get things moving but this should probably be an official Pinecone email / distribution list.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

  • Test poetry install locally
  • Test poetry shell locally
  • Test poetry build locally
  • Successfully run make test within the Poetry virtualenv
  • Update / test nightly and release workflows

@jhamon
Copy link
Collaborator

jhamon commented Aug 16, 2023

Thanks for taking a crack at this. Is there anyway we can do this without duplicating the list of dependencies from requirements.txt?

@zackproser
Copy link
Contributor Author

zackproser commented Aug 16, 2023

Thanks for taking a crack at this. Is there anyway we can do this without duplicating the list of dependencies from requirements.txt?

@jhamon I've read up more on the subject and I think it should be okay to remove the requirements*.txt files - as you mention, the dependencies themselves will now be managed in pyproject.toml.

As far as I'm aware, it's most common to pip install pinecone in all our example Jupyter Notebooks and other folks' Notebooks - and the same is true for libraries like LangChain that import pinecone - so I don't think this change should cause those use cases any trouble (please double-check my thinking here).

With the new contributor's guide (feedback also very welcome there) then current and future maintainers should be able to use poetry for local development, tests, packaging and even release.

The same should be true for our Makefile and for the CI/CD workflows, as I'm updating them to use poetry as well.

Please let me know if I've missed anything, as you're more familiar with this project than I am. Thank you 🙇

@zackproser zackproser changed the title [WIP] Support poetry Support Poetry for local development. Add local development guide. Aug 17, 2023
@StankoKuveljic
Copy link

@zackproser I followed the contribution guide and it works for me. Thanks!

It can also work with poetry run python test.py

@zackproser
Copy link
Contributor Author

@zackproser I followed the contribution guide and it works for me. Thanks!

It can also work with poetry run python test.py

Awesome - thanks for confirming @StankoKuveljic !

Copy link
Collaborator

@jhamon jhamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy smokes. This is a great contribution!

I pointed out a few things about CI and publishing I would like to see addressed but this seems 95% there.

Also, since we're done with tox (good riddance) you can also delete the tox.ini file.

.github/workflows/poetry-smoke-tests.yaml Outdated Show resolved Hide resolved
.github/workflows/poetry-smoke-tests.yaml Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
pyproject.toml Outdated

[tool.poetry]
name = "pinecone"
version = "2.2.2"
Copy link
Collaborator

@jhamon jhamon Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the pinecone/__version__ file was the one source of truth on the package version and the value stored there gets updated automatically as part of the publish release CI job. Since poetry needs it in this toml file, we have to maintain and update both version numbers as part of our release process.

Fortunately there is a poetry version command that can handle the file manipulations in toml for us. I think you need to:

  1. Bump the version stored in pyproject.toml during the publish workflow. Do it by adjusting this publish job. I would add a step before building and publishing happens to update the toml version using a command like poetry version ${{ inputs.releaseLevel }} (this variable already exists and should contain a value like major, minor, patch that aligns with the bump types poetry knows about according to poetry version -h).

  2. Leave the existing bump step alone, since I think we still need to update __version__. Unless poetry makes a change to it during the publish. But after a few minutes of googling I don't think it does.

  3. And then also remember to add and commit the change as part of this step. If you forget this step then version will be out of date and there could be a version conflict on a future release.

I would make your best attempt at making these changes, but don't try running the job to test them for now (since publishing a release is not something we want to do off of a dev branch). I'll work out any kinks when I try to publish the next release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks so much for the detailed explanation and links. I'll take a crack at this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab in 69d7fe5 LMK what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the conversation here, I think the changes to publish-to-pypi.yaml make sense. Thanks, Zack!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we've bumped versions in the current client so you'll need to rebase / resolve the version number here, possibly.

@jhamon
Copy link
Collaborator

jhamon commented Aug 21, 2023

I suppose one other thing that comes to mind is the setup.py file still exists and has references to requirements.txt and requirements-grpc.txt. It looks like you merged all the dependencies together into one list but in the past they were broken up intentionally so that only people who wanted to use the grpc functionality would need to pull in those grpc deps by importing pinecone[grpc]. This seems clunky to me, but it was done before my time so I don't have deep context on it.

I'm not sure whether setup.py is still needed in a poetry world. Definitely needs some googling / research. And if we could keep these optional grpc dependencies isolated I think that would be good.

@zackproser
Copy link
Contributor Author

zackproser commented Aug 22, 2023

I suppose one other thing that comes to mind is the setup.py file still exists and has references to requirements.txt and requirements-grpc.txt. It looks like you merged all the dependencies together into one list but in the past they were broken up intentionally so that only people who wanted to use the grpc functionality would need to pull in those grpc deps by importing pinecone[grpc]. This seems clunky to me, but it was done before my time so I don't have deep context on it.

Thanks for mentioning this - I was wondering about it, too. Poetry supports this concept of --extra I think it's called, and then you can have a separate toml header that specifies just the grpc dependencies.

FWIW, I had the same experience as you when encountering the gprc configuration in Notebooks and elsewhere. Which is another thing to consider - we've got a bunch of live example Notebooks in github.com/pinecone-io/examples which both do and do not use the pinecone[grpc] flavor - so I'm wary of breaking anything there - I guess this Poetry cutover would break any Notebooks that weren't pinning their versions of the pinecone client 🤔

I'm not sure whether setup.py is still needed in a poetry world. Definitely needs some googling / research. And if we could keep these optional grpc dependencies isolated I think that would be good.

Roger - makes sense to me. Do you share the opinion that Poetry is generally the preferred / sanest way to manage Python projects these days? I've heard that from two Python gurus that I trust so far.

I'll spend some time looking into separating the grpc dependencies again. Thanks for your help!

@zackproser
Copy link
Contributor Author

I believe I've separated out our gRPC dependencies in the Poetry fashion in 6406198

Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work here, @zackproser. I feel like this is a huge improvement to our workflow within this repository, and I believe it makes sense to go ahead and incorporate Poetry.

I've left some minor comments and suggestions, but overall I feel like this is good to go.

I checked out your fork & branch locally and ran all the associated poetry commands successfully. Feels like it works really well out of the box:
Screenshot 2023-09-28 at 11 01 45 AM

As for the CI changes, I think things look good. We're maintaining the previous __version__ management behavior and leveraging poetry version properly to align things. Like Jen mentioned, if we run into issues when publishing we can resolve them then.

Overall big net win I think, thanks so much!

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated

[tool.poetry]
name = "pinecone"
version = "2.2.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the conversation here, I think the changes to publish-to-pypi.yaml make sense. Thanks, Zack!

pyproject.toml Outdated Show resolved Hide resolved
@@ -0,0 +1,127 @@
# Contributing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thanks a ton for adding a detailed CONTRIBUTING file! We need something similar for the other clients, and it's great you took the initiative here with the Poetry migration. 🥳

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! It is my pleasure. I want to make this as straightforward as possible for all of us and our customers and community members.

CONTRIBUTING.md Outdated Show resolved Hide resolved
pyproject.toml Outdated

[tool.poetry]
name = "pinecone"
version = "2.2.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we've bumped versions in the current client so you'll need to rebase / resolve the version number here, possibly.

pyproject.toml Outdated
version = "2.2.2"
description = "The official Python client for Pinecone's vector database"
authors = ["Pinecone Systems, Inc. <[email protected]>"]
license = "https://github.com/pinecone-io/pinecone-python-client/blob/main/LICENSE.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LICENSE file was updated yesterday to Apache 2.0, this may need an update as well: #204

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling out - looking at main it appears the link to the license file didn't change, even if the license type and content did - so I think we're okay, so long as this is truly the right way to reference the license in a pyproject.toml (via URL).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it's also fair to use the string Apache-2.0 for the license property, so I did that in 165608e

@zackproser
Copy link
Contributor Author

@austin-denoble Thanks so much for taking the time to do a detailed review and local testing! 🙇 I'll address all your feedback and tackle rebasing first thing tomorrow AM.

@zackproser zackproser merged commit c9f5a8d into pinecone-io:main Sep 29, 2023
4 checks passed
@zackproser zackproser deleted the support-poetry branch September 29, 2023 18:18
austin-denoble added a commit that referenced this pull request Oct 3, 2023
## Problem
After the poetry migration in
#193 the `PyPI
Release: Nightly (pinecone-client-nightly)` has been failing:
https://github.com/pinecone-io/pinecone-python-client/actions/runs/6374232183/job/17298658421

The reason is we removed the `setup.py` file as a part of the migration
and `nightly-release.yaml` used to adjust the module name via `sed -i
's/pinecone-client/pinecone-client-nightly/g' setup.py`. To achieve
something similar now that we've changed our build / packaging process
to use Poetry, we need to look at updating the `pyproject.toml` file.

While looking at correcting this, I realized the resulting build
artifacts from `make package` (which now calls `poetry build`) now have
different package names which may cause issues with PyPI:

- (Old) `pinecone-client-2.2.4.tar.gz`
- (New) `pinecone-2.2.4.tar.gz`
- (Old) `pinecone_client-2.2.4-py3-none-any.whl`
- (New) `pinecone-2.2.4-py3-none-any.whl`

## Solution

- I updated `nightly-release.yaml` to call `sed -i ...` on the
`pyproject.toml` file instead of `setup.py`. This should swap out `name:
pinecone-client` for `name: pinecone-client-nightly` under
`[tools.poetry]` which should fix up the package naming.
- Added a `packages` section to `pyproject.toml` to define the top-level
pinecone package entrypoint. Poetry uses the `name` field by default
unless this is specified which is why we were ending up with artifacts
called `pinecone-...` rather than `pinecone-client-...`.

**Note:**
Poetry swaps the hyphen in `pinecone-client` for an underscore in the
resulting build output artifacts.

New packages:
- `pinecone_client-2.2.4.tar.gz`
- `pinecone_client-2.2.4-py3-none-any.whl`

The `.whl` naming is equivalent, and the `tar.gz` has an underscore
rather than a hyphen. I think ultimately this shouldn't matter given the
guidance around `name` here:

https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#name

> Comparison of project names is case insensitive and treats
arbitrarily-long runs of underscores, hyphens, and/or periods as equal.
For example, if you register a project named cool-stuff, users will be
able to download it or declare a dependency on it using any of the
following spellings:
Cool-Stuff
cool.stuff
COOL_STUFF
CoOl__-.-__sTuFF


## Type of Change

- [X] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update
- [X] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan

Describe specific steps for validating this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Poetry
4 participants