Skip to content

Commit

Permalink
Initial doc: installation, quickstart, API doc
Browse files Browse the repository at this point in the history
Trimmed the readme, and added a bit on re2.

Had to update a ton of docstrings to have a decent API doc.

Also removed a bunch of leftover references to parsers, and removed
the completely useless `Parse` bit from `ParseResult`,
`PartialParseResult`, and `DefaultedParseResult`, which further
contributed to the churn and every file in the project being touched,
as it required editing files without docstrings.

Fixes ua-parser#182
  • Loading branch information
masklinn committed Mar 26, 2024
1 parent 48000c5 commit c3f2533
Show file tree
Hide file tree
Showing 25 changed files with 724 additions and 184 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ dist/
tmp/
regexes.yaml
_regexes.py
doc/_build
86 changes: 29 additions & 57 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,20 @@ Just add ``ua-parser`` to your project's dependencies, or run
to install in the current environment.

Getting Started
---------------
Installing `google-re2 <https://pypi.org/project/google-re2/>`_ is
*strongly* recommended as it leads to *significantly* better
performances. This can be done directly via the ``re2`` optional
dependency:

.. code-block:: sh
$ pip install 'ua_parser[re2]'
If ``re2`` is available, ``ua-parser`` will simply use it by default
instead of the pure-python resolver.

Quick Start
-----------

Retrieve all data on a user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -41,25 +53,25 @@ Retrieve all data on a user-agent string
>>> from ua_parser import parse
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> parse(ua_string) # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
ParseResult(user_agent=UserAgent(family='Chrome',
major='41',
minor='0',
patch='2272',
patch_minor='104'),
os=OS(family='Mac OS X',
major='10',
minor='9',
patch='4',
patch_minor=None),
device=Device(family='Mac',
brand='Apple',
model='Mac'),
string='Mozilla/5.0 (Macintosh; Intel Mac OS...
Result(user_agent=UserAgent(family='Chrome',
major='41',
minor='0',
patch='2272',
patch_minor='104'),
os=OS(family='Mac OS X',
major='10',
minor='9',
patch='4',
patch_minor=None),
device=Device(family='Mac',
brand='Apple',
model='Mac'),
string='Mozilla/5.0 (Macintosh; Intel Mac OS...
Any datum not found in the user agent string is set to ``None``::
>>> parse("")
ParseResult(user_agent=None, os=None, device=None, string='')
Result(user_agent=None, os=None, device=None, string='')
Extract only browser data from user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -94,43 +106,3 @@ Extract device information from user-agent string
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> parse_device(ua_string)
Device(family='Mac', brand='Apple', model='Mac')
Parser
~~~~~~
Parsers expose the same functions (``parse``, ``parse_user_agent``,
``parse_os``, and ``parse_device``) as the top-level of the package,
however these are all *utility* methods.
The actual protocol of parsers, and the one method which must be
implemented / overridden is::
def __call__(self, str, Components, /) -> ParseResult:
It's similar to but more flexible than ``parse``:
- The ``str`` is the user agent string.
- The ``Components`` is a hint, through which the caller requests the
domain (component) they are looking for, any combination of
``Components.USER_AGENT``, ``Components.OS``, and
``Components.DEVICE``. ``Domains.ALL`` exists as a convenience alias
for the combination of all three.
The parser *must* return at least the requested information, but if
that's more convenient or no more expensive it *can* return more.
- The ``ParseResult`` is similar to ``CompleteParseResult``, except
all the attributes are ``Optional`` and it has a ``components:
Components`` attribute which specifies whether a component was never
requested (its value for the user agent string is unknown) or it has
been requested but could not be resolved (no match was found for the
user agent).
``ParseResult.complete()`` convert to a ``CompleteParseResult`` if
all the components are set, and raise an exception otherwise. If
some of the components are set to ``None``, they'll be swapped for a
default value.
Calling the parser directly is part of the public API. One of the
advantage is that it does not return default values, as such it allows
more easily differentiating between a non-match (= ``None``) and a
default fallback (``family = "Other"``).
20 changes: 20 additions & 0 deletions doc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
10 changes: 10 additions & 0 deletions doc/_templates/navigation.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<h3>{{ _('Navigation') }}</h3>
{{ toctree(includehidden=theme_sidebar_includehidden, collapse=theme_sidebar_collapse, maxdepth=3) }}
{% if theme_extra_nav_links %}
<hr />
<ul>
{% for text, uri in theme_extra_nav_links.items() %}
<li class="toctree-l1"><a href="{{ uri }}">{{ text }}</a></li>
{% endfor %}
</ul>
{% endif %}
169 changes: 169 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
===
API
===

Global Helpers
--------------

.. module:: ua_parser

.. autofunction:: parse

.. autofunction:: parse_user_agent

.. autofunction:: parse_os

.. autofunction:: parse_device

.. autodata:: parser

Core Types
----------

.. autoclass:: Resolver
:members:
:special-members: __call__

.. autoclass:: Domain
:members:
:member-order: bysource

.. autoclass:: Parser
:members:

Data Types
----------

These are the various types produced by successfully resolving a user
agent string. They are guaranteed to be `dataclasses
<https://docs.python.org/3/library/dataclasses.html>`_, and using
dataclass utility functions is officially supported.

.. autoclass:: PartialResult
:members:

.. autoclass:: Result
:members:

.. autoclass:: UserAgent

.. autoclass:: OS

.. autoclass:: Device

.. autoclass:: DefaultedResult
:members:


Base Resolvers
--------------

Base resolvers take sets of :class:`~ua_parser.core.Matchers`
generated by :ref:`loaders <Loading>`, and use them to extract data
from user agent strings.

.. autoclass:: ua_parser.basic.Resolver(Matchers)

.. class:: ua_parser.re2.Resolver(Matchers)

An advanced resolver based around |re2|_'s ``FilteredRE2`` feature,
which efficiently prunes the number of possibly matching matchers
before actually running them.

Sufficiently fast that a cache may not be necessary, and may even
be detrimental at smaller cache sizes

.. warning:: Only available if |re2|_ is installed.

Eager Matchers
''''''''''''''

.. automodule:: ua_parser.matchers
:members:
:member-order: bysource
:show-inheritance:

Lazy Matchers
'''''''''''''

These matchers will lazily compile their
:attr:`~ua_parser.core.Matcher.pattern` to an :class:`re.Pattern`.


While this saves CPU upfront, this is most useful with resolvers which
likely will *not* need to apply most of them, like
:class:`ua_parser.re2.Resolver`. If the resolver will very likely need
to apply (and thus compile) every pattern like
:class:`ua_parser.basic.Resolver`, then lazy compilation has a higher
overhead.

.. automodule:: ua_parser.lazy
:members:
:member-order: bysource
:show-inheritance:

Caching
-------

Web clients commonly have multiple interactions with a given system,
leading to significant repetition in user agents encountered. A cache
allows making use of that to avoid redundant parses, at the cost of
memory. This is most useful for slow base resolvers like :class:`the
basic resolver <ua_parser.basic.Resolver>`.

.. autoclass:: ua_parser.caching.Cache
:members: __getitem__, __setitem__

.. autoclass:: ua_parser.CachingResolver
:members:

.. autoclass:: ua_parser.Cache

.. module:: ua_parser.caching

.. autoclass:: S3Fifo

.. autoclass:: Sieve

.. autoclass:: Lru

.. autoclass:: Local

.. _loading:

Loading
-------

.. autoclass:: ua_parser.core.Matchers

.. autoclass:: ua_parser.core.Matcher
:members:
:special-members: __call__

.. autofunction:: ua_parser.load_builtins() -> Matchers

.. autofunction:: ua_parser.load_lazy_builtins() -> Matchers

Custom `regexes.yaml`_ data
'''''''''''''''''''''''''''

.. module:: ua_parser.loaders

.. autofunction:: load_data(MatchersData) -> Matchers

.. autofunction:: load_lazy(MatchersData) -> Matchers

.. autofunction:: load_json

.. function:: load_yaml(f: PathOrFile, loader: DataLoader = load_data) -> Matchers

Loads YAML data following the ``regexes.yaml`` structure.

The ``loader`` parameter customises which matcher variant is
generated, by default :func:`load_data` is used to generate eager
matchers, :func:`load_lazy` can be used to generate lazy matchers
instead.

.. warning:: Only available if |pyyaml|_ is installed.

.. _regexes.yaml: https://github.com/ua-parser/uap-core/blob/master/docs/specification.md
44 changes: 44 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import os
import sys

sys.path.insert(0, os.path.normpath(os.path.join(__file__, "../..", "src")))
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "UA Parser"
copyright = "2024, UA Parser Project"
author = "UA Parser Project"

version = "1.0"
release = "1.0"

rst_epilog = """
.. |pyyaml| replace:: ``PyYaml``
.. |re2| replace:: ``google-re2``
.. _pyyaml: https://pyyaml.org
.. _re2: https://pypi.org/project/google-re2
"""

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.todo",
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
]

templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

language = "en"

html_theme = "alabaster"

intersphinx_mapping = {"python": ("https://docs.python.org/3", None)}
Loading

0 comments on commit c3f2533

Please sign in to comment.