Annif: Research backends #21

hortongn · 2023-04-06T21:37:42Z

Learn more about backends in Annif and narrow in on the backend(s) we may want to use.

scherztc · 2023-04-13T20:03:44Z

Backend: TF-IDF
Backend: fastText
Backend: Omikuji
Backend: MLLM
Backend: STWFSA
Backend: YAKE
Backend: SVC
Fusion/Ensemble backends that combine results from other backends
Backend: Ensemble
Backend: PAV
Backend: nn_ensemble
Special backends
Backend: HTTP
Backend: Dummy

haitzlm · 2023-04-14T20:42:33Z

**Back End Summaries:

TF-IDF: The TF-IDF (Term Frequency-Inverse Document Frequency) backend uses a statistical method to weight words in a document based on their frequency and relevance to the document. It can be used for document classification and information retrieval tasks.
fastText: The fastText backend is a deep learning model that is designed to handle text classification tasks. It uses a neural network architecture that can handle subword information, making it particularly effective for handling misspellings and out-of-vocabulary words.
Omikuji: The Omikuji backend is a probabilistic method that uses the Naive Bayes algorithm to classify documents. It is based on the Bag-of-Words model and is a relatively simple and lightweight method that can work well for small to medium-sized datasets.
MLLM: The MLLM (Maximum Likelihood Language Model) backend is a probabilistic method that models the probability of a document given its classification category. It is based on a language model that captures the statistical relationships between words in a document.
STWFSA: The STWFSA (Suffix Tree Weighted Finite State Automaton) backend is a method that uses a combination of suffix trees and finite state automata to represent and process text. It can be used for text classification and information retrieval tasks.
YAKE: The YAKE (Yet Another Keyword Extractor) backend is a text mining algorithm that identifies keywords and keyphrases in documents. It can be used to generate keywords that can then be used as input to other classification algorithms.
SVC: The SVC (Support Vector Classifier) backend is a popular machine learning algorithm that can be used for text classification tasks. It works by finding a hyperplane that separates the different classes in the feature space.
Fusion/Ensemble backends: These backends combine the results of other backends to generate a final classification result. For example, the Ensemble backend can combine the results of multiple backends using a simple voting scheme, while the nn_ensemble backend uses a neural network to learn the optimal combination of backends.
PAV: The PAV (Pool Adjacent Violators) backend is a method for constructing monotonic classification models. It works by finding a partition of the feature space that maximizes the difference in class probabilities between adjacent regions.

Special backends:

The HTTP backend allows Annif to interface with external services using a REST API.
The Dummy backend provides a simple baseline for comparison with other backends, by always returning the same classification result regardless of the input document.

hortongn added this to App Dev AI Project Apr 6, 2023

github-project-automation bot moved this to Triage in App Dev AI Project Apr 6, 2023

hortongn moved this from Triage to Todo in App Dev AI Project Apr 6, 2023

hortongn changed the title ~~Research backends~~ Annif: Research backends Apr 13, 2023

haitzlm self-assigned this Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annif: Research backends #21

Annif: Research backends #21

hortongn commented Apr 6, 2023

scherztc commented Apr 13, 2023

haitzlm commented Apr 14, 2023 •

edited

Loading

Annif: Research backends #21

Annif: Research backends #21

Comments

hortongn commented Apr 6, 2023

scherztc commented Apr 13, 2023

haitzlm commented Apr 14, 2023 • edited Loading

haitzlm commented Apr 14, 2023 •

edited

Loading