Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python wrapper, API #672

Merged
merged 43 commits into from
Oct 2, 2024
Merged

Python wrapper, API #672

merged 43 commits into from
Oct 2, 2024

Conversation

SKalt
Copy link
Contributor

@SKalt SKalt commented Jul 27, 2024

When finished, this closes #634. To most effectively review this PR, skip right to reading the integration test (wrappers/python/src/tests/integration.py).

There are still a number of FIXMEs and TODOs in the code, most of which relate to creating a consistent build of an python wheel for the appropriate OS and arch. There's a viable how-to at https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/, so stay tuned.

@bglw
Copy link
Contributor

bglw commented Jul 30, 2024

Thanks! I'll get into reviewing this in the coming few days 🙂

@SKalt
Copy link
Contributor Author

SKalt commented Aug 4, 2024

Update: I got python wheel construction working!

here's how to run the integration tests

Python development environment setup:

  • python3 >= 3.9
  • poetry
# prepare the binaries
cargo build --release --verbose
cargo build --release --verbose --features "extended"
bin="$PWD/target/release/pagefind"
ext="$PWD/target/release/pagefind_extended"

cd wrappers/python

# set up the python virtual environment
poetry install --no-root # for dev dependencies
export VIRTUAL_ENV="${PWD}/.venv"
export PATH="$VIRTUAL_ENV/bin:$PATH"

# build and install the binary-only wheels

python3 ./build_binary_only_wheel.py \
  --llvm-triple="x86_64-unknown-linux-musl" \
  --bin-path=$bin \
  --version=1.1.0

python3 ./build_binary_only_wheel.py \
  --llvm-triple="x86_64-unknown-linux-musl" \
  --bin-path=$ext \
  --version=1.1.0

poetry build # build the source-only distribution for the python API
# install all the wheels
pip install ./dist/*.whl
pip show --verbose experimental_pagefind_python_bin
pip show --verbose experimental_pagefind_python_bin_extended
pip show --verbose pagefind_python

LOG_LEVEL="DEBUG" python3 ./src/tests/integration.py 2>&1 | tee /tmp/integration_test.log

After this, the python code is 90% ready for prime time. If you're OK with the API, all that's left is add python testing and publication to GitHub Actions.

@SKalt SKalt closed this Aug 4, 2024
@SKalt SKalt reopened this Aug 4, 2024
@bglw
Copy link
Contributor

bglw commented Aug 18, 2024

This all looks good to me so far! :)

@bglw
Copy link
Contributor

bglw commented Aug 18, 2024

Regarding testing:

The main branch has been updated to have much better and more reliable tests, and it would be nice to include at least some of the Python testing in that integration suite.

For example, here is one of the Node API integration test files: https://github.com/CloudCannon/pagefind/blob/9a1341642fa3e61891f50603af8d3217652059cd/pagefind/integration_tests/node_api/node_base/build-a-synthetic-index-to-disk-via-the-api.toolproof.yml

Since the APIs are pretty comparable, duplicating that node_api directory and rewriting the scripts to python shouldn't be too big of a lift. I'm also happy to go through and add that testing myself.

@SKalt
Copy link
Contributor Author

SKalt commented Aug 18, 2024

I'll definitely take you up on the offer of translating the integration tests to python! FYI: I started working on publishing the python API separately in https://github.com/SKalt/pagefind_python in order to keep progress going without pestering you. I'll have to backport some of the work into this repo, which won't be way too much work, but it will introduce my more interesting automation choices here:

  • I use cog, a python inlining utility, to sync version numbers across packaging files rather than overwriting the packaging files during CI.
  • I moved all my shell scripts out of workflows *.yaml and into *.sh files, which I call from the workflow *.yaml

If you're cool with those decisions and taking on the work of maintaining the python API here, I'll pull in my work from the other repo.

@bglw
Copy link
Contributor

bglw commented Aug 18, 2024

I'll definitely take you up on the offer of translating the integration tests to python!

Grand! I'll tack that onto this branch (I just pushed a merge in from main to get the new suite) (no more erroneous failing tests 😍)

I use cog, a python inlining utility, to sync version numbers across packaging files rather than overwriting the packaging files during CI.

Looks fine to me 👍

I moved all my shell scripts out of workflows *.yaml and into *.sh files, which I call from the workflow *.yaml

Also all good — I also tend to use cjs scripts for anything complex. This repo already has a .backstage directory for this kind of workflow script 🙂

If you're cool with those decisions and taking on the work of maintaining the python API here, I'll pull in my work from the other repo.

Yeah let's do it! Hopefully it won't need much faffing with over time since the API is pretty locked in.

@SKalt
Copy link
Contributor Author

SKalt commented Aug 18, 2024

Yeah let's do it!

Alright! I'm currently winding down for the day, but I should be able to start the backport tomorrow. I'll @ you when this branch is up-to-date.

Yeah let's do it! Hopefully it won't need much faffing with over time since the API is pretty locked in.

The only remaining bikeshed-ish question I can think of is whether to add an entry point to expose a command-line interface that doesn't need to be prefixed with python -m.

There was a good discussion of the tradeoffs in the zig-pypi repo:

The reason I was wondering is I was thinking it might be helpful to give the ziglang package a zig entrypoint

Also something I was thinking about. Consider, however, that it would be unfortunate if the ziglang package, when installed by a dependency of some Python project in your $HOME/.local/bin (perhaps you ran pip install --local), would override your system-wide zig binary.

Of course, the entry point installed by the ziglang PyPI package could be namespaced. But python -m is already a form of namespacing--one that is perfectly in sync with the way you already run Python.

But that decision doesn't really impact shipping this API.

@bglw
Copy link
Contributor

bglw commented Aug 19, 2024

I think I agree with the logic from Zig there — I think having the python -m prefix is fine, since Pagefind normally goes into build scripts anyway.

It's been a while since I've been in the Python ecosystem — does that flow allow you to run Pagefind without an explicit install? I'd love an alternative fast-path to npx pagefind — i.e. pipx pagefind or something akin.

@SKalt
Copy link
Contributor Author

SKalt commented Aug 20, 2024

does that flow allow you to run Pagefind without an explicit install? I'd love an alternative fast-path to npx pagefind — i.e. pipx pagefind or something akin.

There's pipx run, but that requires installing pipx.

@SKalt
Copy link
Contributor Author

SKalt commented Aug 24, 2024

Hey, this PR should be pretty much ready for your updated tests!

As an aside: scripting builds that rely on GitHub actions was a bit painful since I had to keep guessing at what filesystem side-effects each action takes. If you're interested in investing in a build process that can operate fully locally, I have a Makefile I could share.

@SKalt SKalt force-pushed the python-bindings branch 3 times, most recently from 02b36be to 2521585 Compare August 25, 2024 17:12
@bglw
Copy link
Contributor

bglw commented Sep 29, 2024

Alright, I went through and added the full suite of Node API Toolproof tests for this PR!

It served as a good exercise to learn asyncio and get familiar with the Python API before taking on maintenance 🙂

Some changes that came out of it:

  • To match the Node API, I made the get_files function decode the file content from base64 and return it as bytes
    • Added InternalDecodedFile type and used it as the return type there
  • To match the Node API, have added a check for a PAGEFIND_BINARY_PATH environment variable to use
  • To match the Node API, have added an optional output_path argument to the write_files function to override the index config
  • Fixed one response type for DeleteIndex (though tbf it probably should have originally been a DeletedIndex response 😅)
  • Change __aexit__ to not assert self._service and self._index_id, to allow index.delete_index() from within the context
    • Also, changed the docs, as the code sample there didn't work for me. await index = PagefindIndex() errored, so I could only ever construct it as async with PagefindIndex....
  • Added error handling for the messages that come back with no message_id due to parsing failure

Most of those changes are paired with a Toolproof test that triggered the error 🙂

Let me know if you see anything untoward! I'll look at tacking a publish action onto this branch to go to TestPyPi soon.

@SKalt
Copy link
Contributor Author

SKalt commented Sep 29, 2024

Looks great! I've been keeping up with your storm of commits and couldn't find anything to add except code golf opportunities.

I'm going to be available for the next 30 minutes, so let me know if there's anything I can do to help with the publication process.

@bglw
Copy link
Contributor

bglw commented Sep 29, 2024

Excellent!

I need to hop off for the afternoon, but will hopefully get a crack at the publishing tonight (NZ).

Feel free to tack any golf tweaks on while you’re here 🙂

@SKalt
Copy link
Contributor Author

SKalt commented Sep 29, 2024

Oh, one last thing: python -O strips assert statements. I've been treating asserts like debug_assert!(), but I want to make extra sure that's not a surprise.

@SKalt SKalt changed the title [WIP] Python wrapper, API Python wrapper, API Sep 29, 2024
@bglw
Copy link
Contributor

bglw commented Sep 30, 2024

Oh, one last thing: python -O strips assert statements. I've been treating asserts like debug_assert!(), but I want to make extra sure that's not a surprise.

Good to know! Wasn't aware 🙂

@bglw
Copy link
Contributor

bglw commented Sep 30, 2024

Ticking away at a few test release runs on a non-fork branch. Won't crack it tonight, but shouldn't be too big of a lift!

First annoyance seems to be configuring the trusted publisher for each package. I can add a "pending publisher" in PyPi for a single package, but it errors if I add multiple pending packages with the same publisher. From what I can read though there is a many<>many relationship here, so this might only be a limitation while the publisher is pending.

I was hoping to avoid setting it up to publish from my machine, so it might just take a few workflow runs back to back to get each successive package out, we'll see. Let me know if you have any ideas there.

@bglw
Copy link
Contributor

bglw commented Sep 30, 2024

Also, what would be the best way to publish to the pagefind-bin and pagefind-bin-extended packages you have on the test pypi? Either deleting them or giving my account (bglw) access.

@SKalt
Copy link
Contributor Author

SKalt commented Sep 30, 2024

I did just add you as owner of the test.pypi packages. I think the easiest way to debug the trusted publisher issue is in a separate repo with a bare-bones python package, but I'm not going to get to work on that tonight so ¯\_(ツ)_/¯

@bglw
Copy link
Contributor

bglw commented Sep 30, 2024

Hah no worries, have kicked off a test publish of all the packages.

Once the projects exist pypi is happy having the same trusted publisher configured for them all — it just seems I can't configure it so that a trusted publisher is allowed to publish multiple new projects from an action.

No big deal, I'll do the first publish to real pypi from a branch anyway and publish a v1.2.0-alpha.0 release. I'll just run that first release action thrice, once for each bin package and then once for the main package. After that it'll be set.

@bglw
Copy link
Contributor

bglw commented Oct 1, 2024

Latest:

 all_binary_only_wheels.py: error: unrecognized arguments: --git-tag v1.2.0-alpha.1

Looks like that should be --tag, but also it likely needs a matching process_tag implementation to the api_package.py logic? (To turn the git tag into a python tag). Will wait for your thoughts

@SKalt
Copy link
Contributor Author

SKalt commented Oct 1, 2024

Yep, you're right. You can import process_tag into build/all_binary_only_wheels.py directly from api_package.py.

@bglw
Copy link
Contributor

bglw commented Oct 2, 2024

And with that, we're publishedonpypi as 1.2.0a5! 🎉

Anything lingering you want to address before I merge this into main? 🙂

@SKalt
Copy link
Contributor Author

SKalt commented Oct 2, 2024

Awesome!! The only thing I wish I could fix is the asyncio subprocess pipe potentially getting clogged when get_files returns too much data. As it stands, pipe's buffer size prevents the problem and I did leave notes about that in the docs, so we're good to merge. If I come across a solution that doesn't involve allocating an absolutely giant buffer, I'll PR a fix.

@bglw
Copy link
Contributor

bglw commented Oct 2, 2024

Nice!

I actually want to match esbuild's subprocess logic here, which is that messages over a certain size get saved to a temp file by the Pagefind service, and this file is then loaded by the wrapper. I know in their case using the filesystem benchmarked faster than stdio over a certain message size, and also has the benefit for us of preventing anything getting clogged up by keeping messages smaller. So at some point getting that added to the service and both wrappers should solve the issue 😄

Thanks a tonne for all of this, I appreciate it! I also feel like I have caught up on a good decade of progress in the Python ecosystem 😆 Very impressed with pypi for this purpose, many rough edges of the npm wrapper that don't exist here.

And thanks for the patience on my side being slow!

I'll merge this into main now. I have a few other small issues I'd like to pick off before the next release, but I'll set a hard deadline of two weeks to put the v1.2.0 release out 🙂

@bglw bglw merged commit 5bcab17 into CloudCannon:main Oct 2, 2024
4 checks passed
@SKalt
Copy link
Contributor Author

SKalt commented Oct 2, 2024

I actually want to match esbuild's subprocess logic here, which is that messages over a certain size get saved to a temp file by the Pagefind service, and this file is then loaded by the wrapper.

That's a clever trick; thanks for telling me about it.

Over the last couple of weeks/months the rust/python tool uv grew an npx analog, so you can now uvx pagefind-bin --help and it should work! I'm not sure if it supports pagefind[bin] extras semantics yet, though.

Thanks a tonne for all of this

Hey, it was great working with you! I expect we'll be in touch once I get around to writing a mkdocs plugin for pagefind.

@bglw
Copy link
Contributor

bglw commented Oct 2, 2024

Over the last couple of weeks/months the rust/python tool uv grew an npx analog, so you can now uvx pagefind-bin --help and it should work! I'm not sure if it supports pagefind[bin] extras semantics yet, though.

Cool! I'll take that for a spin

I expect we'll be in touch once I get around to writing a mkdocs plugin for pagefind.

Looking forward to it 😄

@bglw
Copy link
Contributor

bglw commented Nov 6, 2024

A little delayed again, but v1.2.0 is out 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: Python wrapper package, Python API
2 participants