docs(wrappers/python): add docstrings, check links

Note that the python docstrings are written using reStructuredText (see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#rst-primer, https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html). This has some notable differences from markdown: ```rst links: `link text <https://example.com>` inline code: ``code`` ``` As a drive-by fix, I made the `PagefindIndex.config -> _config` private instead of noting that it should be immutable -- I think this sends a clearer message. Finally, I checked that all the documentation site links were correct: ```sh cd docs npm i hugo # build the docs lychee --include-fragments public/ # check the links ``` This validated the link in ./docs/content/docs/py-api.md work, but it turned up another interesting finding: there's a broken link to https://github.com/CloudCannon/pagefind/blob/main/pagefind/features/compound_filtering.feature.
CloudCannon · Sep 28, 2024 · 89c795c · 89c795c
1 parent 8f58c1d
commit 89c795c
Show file tree

Hide file tree

Showing 2 changed files with 110 additions and 35 deletions.
diff --git a/docs/content/docs/py-api.md b/docs/content/docs/py-api.md
@@ -2,20 +2,20 @@
 title: "Indexing content using the Python API"
 nav_title: "Using the Python API"
 nav_section: References
-weight: 54
+weight: 54 # slightly less weight than the node API
 ---
 
 Pagefind provides an interface to the indexing binary as a Python package you can install and import.
 
 There are situations where using this Python package is beneficial:
 - Integrating Pagefind into an existing Python project, e.g. writing a plugin for a static site generator that can pass in-memory HTML files to Pagefind.
   Pagefind can also return the search index in-memory, to be hosted via the dev mode alongside the files.
-- Users looking to index their site and augment that index with extra non-HTML pages can run a standard Pagefind crawl with [`add_directory`](#indexadddirectory) and augment it with [`add_custom_record`](#indexaddcustomrecord).
-- Users looking to use Pagefind's engine for searching miscellaneous content such as PDFs or subtitles, where [`add_custom_record`](#indexaddcustomrecord) can be used to build the entire index from scratch.
+- Users looking to index their site and augment that index with extra non-HTML pages can run a standard Pagefind crawl with [`add_directory`](#indexadd_directory) and augment it with [`add_custom_record`](#indexadd_custom_record).
+- Users looking to use Pagefind's engine for searching miscellaneous content such as PDFs or subtitles, where [`add_custom_record`](#indexadd_custom_record) can be used to build the entire index from scratch.
 
 ## Example Usage
 
-<!-- this is copied verbatim from wrappers/python/src/tests/integration.py -->
+<!-- this example is copied verbatim from wrappers/python/src/tests/integration.py -->
 
 ```py
 import asyncio
@@ -90,10 +90,21 @@ from pagefind.index import PagefindIndex
 
 async def main():
     async with PagefindIndex() as index: # open the index
-        ... # write to the index
+        ... # update the index
     # the index is closed here and files are written to disk.
 ```
 
+Each method of `PagefindIndex` that talks to the backing Pagefind service can raise errors.
+If an error is is thrown inside `PagefindIndex`'s context, the context closes without writing the index files to disk.
+
+```py
+async def main():
+    async with PagefindIndex() as index: # open the index
+        await index.add_directory("./public")
+        raise Exception("not today")
+    # the index closes without writing anything to disk
+```
+
 `PagefindIndex` optionally takes a configuration dictionary that can apply parts of the [Pagefind CLI config](/docs/config-options/). The options available at this level are:
 
 ```py
@@ -135,8 +146,6 @@ indexed_dir = await index.add_directory("./public", glob="**.{html}")
 Optionally, a custom `glob` can be supplied which controls which files Pagefind will consume within the directory. The default is shown, and the `glob` option can be omitted entirely.  
 See [Wax patterns documentation](https://github.com/olson-sean-k/wax#patterns) for more details.
 
-<!-- FIXME: don't discard errors list -->
-
 ## index.add_html_file
 
 Adds a virtual HTML file to the Pagefind index. Useful for files that don't exist on disk, for example a static site generator that is serving files from memory.
@@ -168,7 +177,6 @@ Instead of `source_path`, a `url` may be supplied to explicitly set the URL of t
 
 The `content` should be the full HTML source, including the outer `<html> </html>` tags. This will be run through Pagefind's standard HTML indexing process, and should contain any required Pagefind attributes to control behaviour.
 
-<!-- FIXME: error array? -->
 If successful, the `file` object is returned containing metadata about the completed indexing.
 
 ## index.add_custom_record
@@ -208,8 +216,6 @@ See the [Filters documentation](https://pagefind.app/docs/filtering/) for semant
 See the [Sort documentation](https://pagefind.app/docs/sorts/) for semantics.  
 *When Pagefind is processing an index, number-like strings will be sorted numerically rather than alphabetically. As such, the value passed in should be `"20"` and not `20`*
 
-<!-- FIXME: errors? -->
-
 If successful, the `file` object is returned containing metadata about the completed indexing.
 
 ## index.get_files
@@ -233,7 +239,12 @@ Closing the `PagefindIndex`'s context automatically calls `index.write_files`.
 If you aren't using `PagefindIndex` as a context manager, calling `index.write_files()` writes the index files to disk, as they would be written when running the standard Pagefind binary directly.
 
 ```py
-await index.write_files("./public/pagefind")
+await index = PagefindIndex(
+    IndexConfig(
+        output_path="./public/pagefind",
+    ),
+)
+await index.write_files()
 ```
 
 The `output_path` option should contain the path to the desired Pagefind bundle directory. If relative, is relative to the current working directory of your Python process.
@@ -244,7 +255,7 @@ Deletes the data for the given index from its backing Pagefind service.
 Doesn't affect any written files or data returned by `get_files()`.
 
 ```python
-await index.delete_index();
+await index.delete_index()
 ```
 
 Calling `index.get_files()` or `index.write_files()` doesn't consume the index, and further modifications can be made. In situations where many indexes are being created, the `delete_index` call helps clear out memory from a shared Pagefind binary service.

diff --git a/wrappers/python/src/pagefind/index/__init__.py b/wrappers/python/src/pagefind/index/__init__.py
@@ -20,37 +20,79 @@
 
 class IndexConfig(TypedDict, total=False):
     root_selector: Optional[str]
+    """
+    The root selector to use for the index.
+    If not supplied, Pagefind will use the ``<html>`` tag.
+    """
     exclude_selectors: Optional[Sequence[str]]
+    """Extra element selectors that Pagefind should ignore when indexing."""
     force_language: Optional[str]
+    """
+    Ignores any detected languages and creates a single index for the entire site as the
+    provided language. Expects an ISO 639-1 code, such as ``en`` or ``pt``.
+    """
     verbose: Optional[bool]
+    """
+    Prints extra logging while indexing the site. Only affects the CLI, does not impact
+    web-facing search.
+    """
     logfile: Optional[str]
+    """
+    A path to a file to log indexing output to in addition to stdout.
+    The file will be created if it doesn't exist and overwritten on each run.
+    """
     keep_index_url: Optional[bool]
+    """Whether to keep ``index.html`` at the end of search result paths.
+
+    By default, a file at ``animals/cat/index.html`` will be given the URL
+    ``/animals/cat/``. Setting this option to ``true`` will result in the URL
+    ``/animals/cat/index.html``.
+    """
     output_path: Optional[str]
+    """
+    The folder to output the search bundle into, relative to the processed site.
+    Defaults to ``pagefind``.
+    """
 
 
 class PagefindIndex:
+    """Manages a Pagefind index.
+
+    ``PagefindIndex`` operates as an async contextmanager.
+    Entering the context starts a backing Pagefind service and creates an in-memory index in the backing service.
+    Exiting the context writes the in-memory index to disk and then shuts down the backing Pagefind service.
+
+    Each method of ``PagefindIndex`` that talks to the backing Pagefind service can raise errors.
+    If an exception is is rased inside ``PagefindIndex``'s context, the context closes without writing the index files to disk.
+
+    ``PagefindIndex`` optionally takes a configuration dictionary that can apply parts of the [Pagefind CLI config](/docs/config-options/). The options available at this level are:
+
+    See the relevant documentation for these configuration options in the
+    `Configuring the Pagefind CLI <https://pagefind.app/docs/config-options/>` documentation.
+    """
+
     _service: Optional["PagefindService"] = None
     _index_id: Optional[int] = None
-    config: Optional[IndexConfig] = None
-    """Note that config is immutable after initialization."""
+    _config: Optional[IndexConfig] = None
+    """Note that config should be immutable."""
 
     def __init__(
         self,
         config: Optional[IndexConfig] = None,
         *,
         _service: Optional["PagefindService"] = None,
         _index_id: Optional[int] = None,
-        # TODO: cache config
     ):
         self._service = _service
         self._index_id = _index_id
-        self.config = config
+        self._config = config
 
     async def _start(self) -> "PagefindIndex":
+        """Start the backing Pagefind service and create an in-memory index."""
         assert self._index_id is None
         assert self._service is None
         self._service = await PagefindService().launch()
-        _index = await self._service.create_index(self.config)
+        _index = await self._service.create_index(self._config)
         self._index_id = _index._index_id
         return self
 
@@ -61,14 +103,14 @@ async def add_html_file(
         source_path: Optional[str] = None,
         url: Optional[str] = None,
     ) -> InternalIndexedFileResponse:
-        """
-        ARGS:
-        content: The source HTML content of the file to be parsed.
-        source_path: The source path of the HTML file if it were to exist on disk. \
+        """Add an HTML file to the index.
+
+        :param content: The source HTML content of the file to be parsed.
+        :param source_path: The source path of the HTML file would have on disk. \
             Must be a relative path, or an absolute path within the current working directory. \
             Pagefind will compute the result URL from this path.
-        url: an explicit URL to use, instead of having Pagefind compute the URL \
-            based on the source_path. If not supplied, source_path must be supplied.
+        :param url: an explicit URL to use, instead of having Pagefind compute the \
+            URL based on the source_path. If not supplied, source_path must be supplied.
         """
         assert self._service is not None
         assert self._index_id is not None
@@ -87,6 +129,16 @@ async def add_html_file(
     async def add_directory(
         self, path: str, *, glob: Optional[str] = None
     ) -> InternalIndexedDirResponse:
+        """Indexes a directory from disk using the standard Pagefind indexing behaviour.
+
+        This is equivalent to running the Pagefind binary with ``--site <dir>``.
+
+        :param path: the path to the directory to index. If the `path` provided is relative, \
+                it will be relative to the current working directory of your Python process.
+        :param glob: a glob pattern to filter files in the directory. If not provided, all \
+            files matching ``**.{html}`` are indexed. For more information on glob patterns, \
+            see the `Wax patterns documentation <https://github.com/olson-sean-k/wax#patterns>`.
+        """
         assert self._service is not None
         assert self._index_id is not None
         result = await self._service.send(
@@ -101,11 +153,12 @@ async def add_directory(
         return cast(InternalIndexedDirResponse, result)
 
     async def get_files(self) -> List[InternalSyntheticFile]:
-        """
+        """Get raw data of all files in the Pagefind index.
+
         WATCH OUT: this method emits all files. This can be a lot of data, and
         this amount of data can cause reading from the subprocess pipes to deadlock.
 
-        STRICTLY PREFER calling `self.write_files()`.
+        STRICTLY PREFER calling ``self.write_files()``.
         """
         assert self._service is not None
         assert self._index_id is not None
@@ -118,6 +171,10 @@ async def get_files(self) -> List[InternalSyntheticFile]:
         return result
 
     async def delete_index(self) -> None:
+        """
+        Deletes the data for the given index from its backing Pagefind service.
+        Doesn't affect any written files or data returned by ``get_files()``.
+        """
         assert self._service is not None
         assert self._index_id is not None
         result = await self._service.send(
@@ -137,14 +194,16 @@ async def add_custom_record(
         filters: Optional[Dict[str, List[str]]] = None,
         sort: Optional[Dict[str, str]] = None,
     ) -> InternalIndexedFileResponse:
-        """
-        ARGS:
-        content: the raw content of this record.
-        url: the output URL of this record. Pagefind will not alter this.
-        language: ISO 639-1 code of the language this record is written in.
-        meta: the metadata to attach to this record. Supplying a `title` is highly recommended.
-        filters: the filters to attach to this record. Filters are used to group records together.
-        sort: the sort keys to attach to this record.
+        """Add a direct record to the Pagefind index.
+
+        This method is useful for adding non-HTML content to the search results.
+
+        :param content: the raw content of this record.
+        :param url: the output URL of this record. Pagefind will not alter this.
+        :param language: ISO 639-1 code of the language this record is written in.
+        :param meta: the metadata to attach to this record. Supplying a ``title`` is highly recommended.
+        :param filters: the filters to attach to this record. Filters are used to group records together.
+        :param sort: the sort keys to attach to this record.
         """
         assert self._service is not None
         assert self._index_id is not None
@@ -164,12 +223,17 @@ async def add_custom_record(
         return cast(InternalIndexedFileResponse, result)
 
     async def write_files(self) -> None:
+        """Write the index files to disk.
+
+        If you're using PagefindIndex as a context manager, there's no need to call this method:
+        if no error occurred, closing the context automatically writes the index files to disk.
+        """
         assert self._service is not None
         assert self._index_id is not None
-        if not self.config:
+        if not self._config:
             output_path = None
         else:
-            output_path = self.config.get("output_path")
+            output_path = self._config.get("output_path")
 
         result = await self._service.send(
             InternalWriteFilesRequest(