The pynder repo is a showcase on how to implement hundreds if not thousands of custom spacy pipeline components in production. This repo is usefull when analyzing large/vast amounts of documents from which one wants to mine many particular fields.
The implemented BaseMatchers:
- BaseRegex
- BaseSpacyMatcher (tokenized matching)
- BaseTFIDF (Twerm Frequency Inverse Document Frequency)
- BaseNormalizedCounter