You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make sure this is flexible enough to support conll-u later on for token level tasks (this might require to hard code a few assumptions on preprocessing and separate tokenization from generic feature extraction)
All implementations should allow for bulk downloads, as well as API access with pagination.
Filters:
all annotations vs. curated gold annotations only
all fields vs. sample_index, id column, text column, gold annotation only
preprocessed vs raw text vs both
Data model changes
add DatasetContent fields to store preprocessed values (e.g. tokenized text)
add separate table for a feature vocabulary (in combination with the above this builds a simple feature store), include metrics like tf and df.
add inherited Annotation sub-class that stores model predictions and confidence values (or uncertainty measures), might require separate entity if we want to also store topic model or clustering results
figure out what to do with model artifacts for continuous learning scenarios
The text was updated successfully, but these errors were encountered:
Meta-issue to prepare the system for external ML components #27
The CSV export that already exists but should be improved upon for the integration of external ML components.
Additional export formats
Make sure this is flexible enough to support conll-u later on for token level tasks (this might require to hard code a few assumptions on preprocessing and separate tokenization from generic feature extraction)
All implementations should allow for bulk downloads, as well as API access with pagination.
Filters:
Data model changes
DatasetContent
fields to store preprocessed values (e.g. tokenized text)Annotation
sub-class that stores model predictions and confidence values (or uncertainty measures), might require separate entity if we want to also store topic model or clustering resultsThe text was updated successfully, but these errors were encountered: