All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added
multi selection entity dropdown
for span annotation overlap (#4735) - Added
pre selection highlight
for span annotation (#4726)
- Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)
- Added Allow overlap spans in the
FeedbackDataset
(#4668) - Added
allow_overlapping
parameter for span questions. (#4697) - Added overall progress bar on
Datasets
table (#4696) - Added German language translation (#4688)
- New UI design for suggestions (#4682)
- Improve performance for more than 250 labels (#4702)
- Added support for automatic detection of RTL languages. (#4686)
- If you expand the labels of a
single or multi
label Question, the state is maintained during the entire annotation process. (#4630) - Added support for span questions in the Python SDK. (#4617)
- Added support for span values in suggestions and responses. (#4623)
- Added
span
questions forFeedbackDataset
. (#4622) - Added
ARGILLA_CACHE_DIR
environment variable to configure the client cache directory. (#4509)
- Fixed contextualized workspaces. (#4665)
- Fixed prepare for training when passing
RankingValueSchema
instances to suggestions. (#4628) - Fixed parsing ranking values in suggestions from HF datasets. (#4629)
- Fixed reading description from API response payload. (#4632)
- Fixed pulling (n*chunk_size)+1 records when using
ds.pull
or iterating over the dataset. (#4662) - Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
Note
For changes in the argilla-server module, visit the argilla-server release notes
- Reorder labels in
dataset settings page
for single/multi label questions (#4598) - Added pandas v2 support using the python SDK. (#4600)
- Removed
missing
response for status filter. Usepending
instead. (#4533)
- Fixed FloatMetadataProperty: value is not a valid float (#4570)
- Fixed redirect to
user-settings
instead of 404user_settings
(#4609)
Note
This release does not contain any new features, but it includes a major change in the argilla-server
dependency.
The package is using the argilla-server
dependency defined here. (#4537)
- Fixed Responsive view for Feedback Datasets. (#4579)
- Added bulk annotation by filter criteria. (#4516)
- Automatically fetch new datasets on focus tab. (#4514)
- API v1 responses returning
Record
schema now always includedataset_id
as attribute. (#4482) - API v1 responses returning
Response
schema now always includerecord_id
as attribute. (#4482) - API v1 responses returning
Question
schema now always includedataset_id
attribute. (#4487) - API v1 responses returning
Field
schema now always includedataset_id
attribute. (#4488) - API v1 responses returning
MetadataProperty
schema now always includedataset_id
attribute. (#4489) - API v1 responses returning
VectorSettings
schema now always includedataset_id
attribute. (#4490) - Added
pdf_to_html
function to.html_utils
module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481) - Added
ARGILLA_AUTH_SECRET_KEY
environment variable. (#4539) - Added
ARGILLA_AUTH_ALGORITHM
environment variable. (#4539) - Added
ARGILLA_AUTH_TOKEN_EXPIRATION
environment variable. (#4539) - Added
ARGILLA_AUTH_OAUTH_CFG
environment variable. (#4546) - Added OAuth2 support for HuggingFace Hub. (#4546)
- Deprecated
ARGILLA_LOCAL_AUTH_*
environment variables. Will be removed in the release v1.25.0. (#4539)
- Changed regex pattern for
username
attribute inUserCreate
. Now uppercase letters are allowed. (#4544)
- Remove sending
Authorization
header from python SDK requests. (#4535)
- Fixed keyboard shortcut for label questions. (#4530)
- Added Bulk annotation support. (#4333)
- Restore filters from feedback dataset settings. ([#4461])(argilla-io#4461)
- Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
- Added pydantic v2 support using the python SDK. (#4459)
- Added
vector_settings
to the__repr__
method of theFeedbackDataset
andRemoteFeedbackDataset
. (#4454) - Added integration for
sentence-transformers
usingSentenceTransformersExtractor
to configurevector_settings
inFeedbackDataset
andFeedbackRecord
. (#4454)
- Module
argilla.cli.server
definitions have been moved toargilla.server.cli
module. (#4472) - [breaking] Changed
vector_settings_by_name
for genericproperty_by_name
usage, which will returnNone
instead of raising an error. (#4454) - The constant definition
ES_INDEX_REGEX_PATTERN
in moduleargilla._constants
is now private. (#4472) nan
values in metadata properties will raise a 422 error when creating/updating records. (#4300)None
values are now allowed in metadata properties. (#4300)- Refactor and add
width
,height
,autoplay
andloop
attributes as optional args into_html
functions. (#4481)
- Paginating to a new record, automatically scrolls down to selected form area. (#4333)
- The
missing
response status for filtering records is deprecated and will be removed in the release v1.24.0. Usepending
instead. (#4433)
- The deprecated
python -m argilla database
command has been removed. (#4472)
- Added new draft queue for annotation view (#4334)
- Added annotation metrics module for the
FeedbackDataset
(argilla.client.feedback.metrics
). (#4175). - Added strategy to handle and translate errors from the server for
401
HTTP status code` (#4362) - Added integration for
textdescriptives
usingTextDescriptivesExtractor
to configuremetadata_properties
inFeedbackDataset
andFeedbackRecord
. (#4400). Contributed by @m-newhauser - Added
POST /api/v1/me/responses/bulk
endpoint to create responses in bulk for current user. (#4380) - Added list support for term metadata properties. (Closes #4359)
- Added new CLI task to reindex datasets and records into the search engine. (#4404)
- Added
httpx_extra_kwargs
argument torg.init
andArgilla
to allow passing extra arguments tohttpx.Client
used byArgilla
. (#4440) - Added
ResponseStatusFilter
enum in__init__
imports of Argilla (#4118). Contributed by @Piyush-Kumar-Ghosh.
- More productive and simpler shortcut system (#4215)
- Move
ArgillaSingleton
,init
andactive_client
to a new modulesingleton
. (#4347) - Updated
argilla.load
functions to also work withFeedbackDataset
s. (#4347) - [breaking] Updated
argilla.delete
functions to also work withFeedbackDataset
s. It now raises an error if the dataset does not exist. (#4347) - Updated
argilla.list_datasets
functions to also work withFeedbackDataset
s. (#4347)
- Fixed error in
TextClassificationSettings.from_dict
method in which thelabel_schema
created was a list ofdict
instead of a list ofstr
. (#4347) - Fixed total records on pagination component (#4424)
- Removed
draft
auto save for annotation view (#4334)
- Added
GET /api/v1/datasets/:dataset_id/records/search/suggestions/options
endpoint to return suggestion available options for searching. (#4260) - Added
metadata_properties
to the__repr__
method of theFeedbackDataset
andRemoteFeedbackDataset
.(#4192). - Added
get_model_kwargs
,get_trainer_kwargs
,get_trainer_model
,get_trainer_tokenizer
andget_trainer
-methods to theArgillaTrainer
to improve interoperability across frameworks. (#4214). - Added additional formatting checks to the
ArgillaTrainer
to allow for better interoperability ofdefaults
andformatting_func
usage. (#4214). - Added a warning to the
update_config
-method ofArgillaTrainer
to emphasize if thekwargs
were updated correctly. (#4214). - Added
argilla.client.feedback.utils
module withhtml_utils
(this mainly includesvideo/audio/image_to_html
that convert media to dataURL to be able to render them in tha Argilla UI andcreate_token_highlights
to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) andassignments
(this mainly includesassign_records
to assign records according to a number of annotators and records, an overlap and the shuffle option; andassign_workspace
to assign and create if needed a workspace according to the record assignment). (#4121)
- Fixed error in
ArgillaTrainer
, with numerical labels, usingRatingQuestion
instead ofRankingQuestion
(#4171) - Fixed error in
ArgillaTrainer
, now we can train forextractive_question_answering
using a validation sample (#4204) - Fixed error in
ArgillaTrainer
, when training forsentence-similarity
it didn't work with a list of values per record (#4211) - Fixed error in the unification strategy for
RankingQuestion
(#4295) - Fixed
TextClassificationSettings.labels_schema
order was not being preserved. Closes #3828 (#4332) - Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
- Fixed error when passing
draft
responses to create records endpoint. (#4354)
- [breaking] Suggestions
agent
field only accepts now some specific characters and a limited length. (#4265) - [breaking] Suggestions
score
field only accepts now float values in the range0
to1
. (#4266) - Updated
POST /api/v1/dataset/:dataset_id/records/search
endpoint to support optionalquery
attribute. (#4327) - Updated
POST /api/v1/dataset/:dataset_id/records/search
endpoint to supportfilter
andsort
attributes. (#4327) - Updated
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint to support optionalquery
attribute. (#4270) - Updated
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint to supportfilter
andsort
attributes. (#4270) - Changed the logging style while pulling and pushing
FeedbackDataset
to Argilla fromtqdm
style torich
. (#4267). Contributed by @zucchini-nlp. - Updated
push_to_argilla
to printrepr
of the pushedRemoteFeedbackDataset
after push and changedshow_progress
to True by default. (#4223) - Changed
models
andtokenizer
for theArgillaTrainer
to explicitly allow for changing them when needed. (#4214).
- Added
POST /api/v1/datasets/:dataset_id/records/search
endpoint to search for records without user context, including responses by all users. (#4143) - Added
POST /api/v1/datasets/:dataset_id/vectors-settings
endpoint for creating vector settings for a dataset. (#3776) - Added
GET /api/v1/datasets/:dataset_id/vectors-settings
endpoint for listing the vectors settings for a dataset. (#3776) - Added
DELETE /api/v1/vectors-settings/:vector_settings_id
endpoint for deleting a vector settings. (#3776) - Added
PATCH /api/v1/vectors-settings/:vector_settings_id
endpoint for updating a vector settings. (#4092) - Added
GET /api/v1/records/:record_id
endpoint to get a specific record. (#4039) - Added support to include vectors for
GET /api/v1/datasets/:dataset_id/records
endpoint response usinginclude
query param. (#4063) - Added support to include vectors for
GET /api/v1/me/datasets/:dataset_id/records
endpoint response usinginclude
query param. (#4063) - Added support to include vectors for
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint response usinginclude
query param. (#4063) - Added
show_progress
argument tofrom_huggingface()
method to make the progress bar for parsing records process optional.(#4132). - Added a progress bar for parsing records process to
from_huggingface()
method withtrange
intqdm
.(#4132). - Added to sort by
inserted_at
orupdated_at
for datasets with no metadata. (4147) - Added
max_records
argument topull()
method forRemoteFeedbackDataset
.(#4074) - Added functionality to push your models to the Hugging Face hub with
ArgillaTrainer.push_to_huggingface
(#3976). Contributed by @Racso-3141. - Added
filter_by
argument toArgillaTrainer
to filter byresponse_status
(#4120). - Added
sort_by
argument toArgillaTrainer
to sort bymetadata
(#4120). - Added
max_records
argument toArgillaTrainer
to limit record used for training (#4120). - Added
add_vector_settings
method to local and remoteFeedbackDataset
. (#4055) - Added
update_vectors_settings
method to local and remoteFeedbackDataset
. (#4122) - Added
delete_vectors_settings
method to local and remoteFeedbackDataset
. (#4130) - Added
vector_settings_by_name
method to local and remoteFeedbackDataset
. (#4055) - Added
find_similar_records
method to local and remoteFeedbackDataset
. (#4023) - Added
ARGILLA_SEARCH_ENGINE
environment variable to configure the search engine to use. (#4019)
- [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
- [breaking] Users working with OpenSearch engines must use version >=2.4 and set
ARGILLA_SEARCH_ENGINE=opensearch
. (#4019 and #4111) - [breaking] Changed
FeedbackDataset.*_by_name()
methods to returnNone
when no match is found (#4101). - [breaking]
limit
query parameter forGET /api/v1/datasets/:dataset_id/records
endpoint is now only accepting values greater or equal than1
and less or equal than1000
. (#4143) - [breaking]
limit
query parameter forGET /api/v1/me/datasets/:dataset_id/records
endpoint is now only accepting values greater or equal than1
and less or equal than1000
. (#4143) - Update
GET /api/v1/datasets/:dataset_id/records
endpoint to fetch record using the search engine. (#4142) - Update
GET /api/v1/me/datasets/:dataset_id/records
endpoint to fetch record using the search engine. (#4142) - Update
POST /api/v1/datasets/:dataset_id/records
endpoint to allow to create records withvectors
(#4022) - Update
PATCH /api/v1/datasets/:dataset_id
endpoint to allow updatingallow_extra_metadata
attribute. (#4112) - Update
PATCH /api/v1/datasets/:dataset_id/records
endpoint to allow to update records withvectors
. (#4062) - Update
PATCH /api/v1/records/:record_id
endpoint to allow to update record withvectors
. (#4062) - Update
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint to allow to search records with vectors. (#4019) - Update
BaseElasticAndOpenSearchEngine.index_records
method to also index record vectors. (#4062) - Update
FeedbackDataset.__init__
to allow passing a list of vector settings. (#4055) - Update
FeedbackDataset.push_to_argilla
to also push vector settings. (#4055) - Update
FeedbackDatasetRecord
to support the creation of records with vectors. (#4043) - Using cosine similarity to compute similarity between vectors. (#4124)
- Fixed svg images out of screen with too large images (#4047)
- Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
- Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
- Fixed passing user_id when getting records by id. (Commit 98c7927)
- Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)
- New
GET /api/v1/datasets/:dataset_id/metadata-properties
endpoint for listing dataset metadata properties. (#3813) - New
POST /api/v1/datasets/:dataset_id/metadata-properties
endpoint for creating dataset metadata properties. (#3813) - New
PATCH /api/v1/metadata-properties/:metadata_property_id
endpoint allowing the update of a specific metadata property. (#3952) - New
DELETE /api/v1/metadata-properties/:metadata_property_id
endpoint for deletion of a specific metadata property. (#3911) - New
GET /api/v1/metadata-properties/:metadata_property_id/metrics
endpoint to compute metrics for a specific metadata property. (#3856) - New
PATCH /api/v1/records/:record_id
endpoint to update a record. (#3920) - New
PATCH /api/v1/dataset/:dataset_id/records
endpoint to bulk update the records of a dataset. (#3934) - Missing validations to
PATCH /api/v1/questions/:question_id
. Nowtitle
anddescription
are using the same validations used to create questions. (#3967) - Added
TermsMetadataProperty
,IntegerMetadataProperty
andFloatMetadataProperty
classes allowing to define metadata properties for aFeedbackDataset
. (#3818) - Added
metadata_filters
tofilter_by
method inRemoteFeedbackDataset
to filter based on metadata i.e.TermsMetadataFilter
,IntegerMetadataFilter
, andFloatMetadataFilter
. (#3834) - Added a validation layer for both
metadata_properties
andmetadata_filters
in their schemas and as part of theadd_records
andfilter_by
methods, respectively. (#3860) - Added
sort_by
query parameter to listing records endpoints that allows to sort the records byinserted_at
,updated_at
or metadata property. (#3843) - Added
add_metadata_property
method to bothFeedbackDataset
andRemoteFeedbackDataset
(i.e.FeedbackDataset
in Argilla). (#3900) - Added fields
inserted_at
andupdated_at
inRemoteResponseSchema
. (#3822) - Added support for
sort_by
forRemoteFeedbackDataset
i.e. aFeedbackDataset
uploaded to Argilla. (#3925) - Added
metadata_properties
support for bothpush_to_huggingface
andfrom_huggingface
. (#3947) - Add support for update records (
metadata
) from Python SDK. (#3946) - Added
delete_metadata_properties
method to delete metadata properties. (#3932) - Added
update_metadata_properties
method to updatemetadata_properties
. (#3961) - Added automatic model card generation through
ArgillaTrainer.save
(#3857) - Added
FeedbackDataset
TaskTemplateMixin
for pre-defined task templates. (#3969) - A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
- New
last_activity_at
field toFeedbackDataset
exposing when the last activity for the associated dataset occurs. (#3992)
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
andPOST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to return thetotal
number of records. (#3848, #3903)- Implemented
__len__
method for filtered datasets to return the number of records matching the provided filters. (#3916) - Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
- Force elastic index refresh after records creation. (#3929)
- Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
- Using metadata property name instead of id for indexing data in search engine index. (#3994)
- Fixed response schemas to allow
values
to beNone
i.e. when a record is discarded theresponse.values
are set toNone
. (#3926)
- Added fields
inserted_at
andupdated_at
inRemoteResponseSchema
(#3822). - Added automatic model card generation through
ArgillaTrainer.save
(#3857). - Added task templates to the
FeedbackDataset
(#3973).
- Updated
Dockerfile
to use multi stage build (#3221 and #3793). - Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
- Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
- FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
- The
unify_responses
support for remote datasets (#3937).
- Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
- Updated active learning for text classification notebooks to pass ids of type int to
TextClassificationRecord
(#3831). - Fixed record fields validation that was preventing from logging records with optional fields (i.e.
required=True
) when the field value wasNone
(#3846). - Always set
pretrained_model_name_or_path
attribute as string inArgillaTrainer
(#3914). - The
inserted_at
andupdated_at
attributes are create using theutcnow
factory to avoid unexpected race conditions on timestamp creation (#3945) - Fixed
configure_dataset_settings
when providing the workspace via the argworkspace
(#3887). - Fixed saving of models trained with
ArgillaTrainer
with apeft_config
parameter (#3795). - Fixed backwards compatibility on
from_huggingface
when loading aFeedbackDataset
from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829). - Fixed wrong
__repr__
problem forTrainingTask
. (#3969) - Fixed wrong key return error
prepare_for_training_with_*
forTrainingTask
. (#3969)
- Function
rg.configure_dataset
is deprecated in favour ofrg.configure_dataset_settings
. The former will be removed in version 1.19.0
- Added
ArgillaTrainer
integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739) - Added
ArgillaTrainer
integration withTrainingTask.for_question_answering
(#3740) - Added
Auto save record
to save automatically the current record that you are working on (#3541) - Added
ArgillaTrainer
integration with OpenAI, allowing fine tuning for chat completion (#3615) - Added
workspaces list
command to list Argilla workspaces (#3594). - Added
datasets list
command to list Argilla datasets (#3658). - Added
users create
command to create users (#3667). - Added
whoami
command to get current user (#3673). - Added
users delete
command to delete users (#3671). - Added
users list
command to list users (#3688). - Added
workspaces delete-user
command to remove a user from a workspace (#3699). - Added
datasets list
command to list Argilla datasets (#3658). - Added
users create
command to create users (#3667). - Added
users delete
command to delete users (#3671). - Added
workspaces create
command to create an Argilla workspace (#3676). - Added
datasets push-to-hub
command to push aFeedbackDataset
from Argilla into the HuggingFace Hub (#3685). - Added
info
command to get info about the used Argilla client and server (#3707). - Added
datasets delete
command to delete aFeedbackDataset
from Argilla (#3703). - Added
created_at
andupdated_at
properties toRemoteFeedbackDataset
andFilteredRemoteFeedbackDataset
(#3709). - Added handling
PermissionError
when executing a command with a logged in user with not enough permissions (#3717). - Added
workspaces add-user
command to add a user to workspace (#3712). - Added
workspace_id
param toGET /api/v1/me/datasets
endpoint (#3727). - Added
workspace_id
arg tolist_datasets
in the Python SDK (#3727). - Added
argilla
script that allows to execute Argilla CLI using theargilla
command (#3730). - Added support for passing already initialized
model
andtokenizer
instances to theArgillaTrainer
(#3751) - Added
server_info
function to check the Argilla server information (also accessible viarg.server_info
) (#3772).
- Move
database
commands underserver
group of commands (#3710) server
commands only included in the CLI app whenserver
extra requirements are installed (#3710).- Updated
PUT /api/v1/responses/{response_id}
to replacevalues
stored with receivedvalues
in request (#3711). - Display a
UserWarning
when theuser_id
inWorkspace.add_user
andWorkspace.delete_user
is the ID of an user with the owner role as they don't require explicit permissions (#3716). - Rename
tasks
sub-package tocli
(#3723). - Changed
argilla database
command in the CLI to now be accessed viaargilla server database
, to be deprecated in the upcoming release (#3754). - Changed
visible_options
(of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
- Fixed
remove user modification in text component on clear answers
(#3775) - Fixed
Highlight raw text field in dataset feedback task
(#3731) - Fixed
Field title too long
(#3734) - Fixed error messages when deleting a
DatasetForTextClassification
(#3652) - Fixed
Pending queue
pagination problems when during data annotation (#3677) - Fixed
visible_labels
default value to be 20 just whenvisible_labels
not provided andlen(labels) > 20
, otherwise it will either be the providedvisible_labels
value orNone
, forLabelQuestion
andMultiLabelQuestion
(#3702). - Fixed
DatasetCard
generation whenRemoteFeedbackDataset
contains suggestions (#3718). - Add missing
draft
status inResponseSchema
as now there can be responses withdraft
status when annotating via the UI (#3749). - Searches when queried words are distributed along the record fields (#3759).
- Fixed Python 3.11 compatibility issue with
/api/datasets
endpoints due to theTaskType
enum replacement in the endpoint URL (#3769). - Fixed
RankingValueSchema
andFeedbackRankingValueModel
schemas to allowrank=None
whenstatus=draft
(#3781).
- Fixed
Text component
text content sanitization behavior just for markdown to prevent disappear the text(#3738) - Fixed
Text component
now you need to press Escape to exit the text area (#3733) - Fixed
SearchEngine
was creating the same number of primary shards and replica shards for eachFeedbackDataset
(#3736).
- Added
Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI
(#3489) - Added
ArgillaTrainer
integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467) - Added
formatting_func
toArgillaTrainer
forFeedbackDataset
datasets add a custom formatting for the data (#3599). - Added
login
function inargilla.client.login
to login into an Argilla server and store the credentials locally (#3582). - Added
login
command to login into an Argilla server (#3600). - Added
logout
command to logout from an Argilla server (#3605). - Added
DELETE /api/v1/suggestions/{suggestion_id}
endpoint to delete a suggestion given its ID (#3617). - Added
DELETE /api/v1/records/{record_id}/suggestions
endpoint to delete several suggestions linked to the same record given their IDs (#3617). - Added
response_status
param toGET /api/v1/datasets/{dataset_id}/records
to be able to filter byresponse_status
as previously included forGET /api/v1/me/datasets/{dataset_id}/records
(#3613). - Added
list
classmethod toArgillaMixin
to be used asFeedbackDataset.list()
, also including theworkspace
to list from as arg (#3619). - Added
filter_by
method inRemoteFeedbackDataset
to filter based onresponse_status
(#3610). - Added
list_workspaces
function (to be used asrg.list_workspaces
, butWorkspace.list
is preferred) to list all the workspaces from an user in Argilla (#3641). - Added
list_datasets
function (to be used asrg.list_datasets
) to list theTextClassification
,TokenClassification
, andText2Text
datasets in Argilla (#3638). - Added
RemoteSuggestionSchema
to manage suggestions in Argilla, including thedelete
method to delete suggestios from Argilla viaDELETE /api/v1/suggestions/{suggestion_id}
(#3651). - Added
delete_suggestions
toRemoteFeedbackRecord
to remove suggestions from Argilla viaDELETE /api/v1/records/{record_id}/suggestions
(#3651).
- Changed
Optional label for * mark for required question
(#3608) - Updated
RemoteFeedbackDataset.delete_records
to use batch delete records endpoint (#3580). - Included
allowed_for_roles
for someRemoteFeedbackDataset
,RemoteFeedbackRecords
, andRemoteFeedbackRecord
methods that are only allowed for users with rolesowner
andadmin
(#3601). - Renamed
ArgillaToFromMixin
toArgillaMixin
(#3619). - Move
users
CLI app underdatabase
CLI app (#3593). - Move server
Enum
classes toargilla.server.enums
module (#3620).
- Fixed
Filter by workspace in breadcrumbs
(#3577) - Fixed
Filter by workspace in datasets table
(#3604) - Fixed
Query search highlight
for Text2Text and TextClassification (#3621) - Fixed
RatingQuestion.values
validation to raise aValidationError
when values are out of range i.e. [1, 10] (#3626).
- Removed
multi_task_text_token_classification
fromTaskType
as not used (#3640). - Removed
argilla_id
in favor ofid
fromRemoteFeedbackDataset
(#3663). - Removed
fetch_records
fromRemoteFeedbackDataset
as now the records are lazily fetched from Argilla (#3663). - Removed
push_to_argilla
fromRemoteFeedbackDataset
, as it just works when calling it through aFeedbackDataset
locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663). - Removed
set_suggestions
in favor ofupdate(suggestions=...)
for bothFeedbackRecord
andRemoteFeedbackRecord
, as all the updates of any "updateable" attribute of a record will go throughupdate
instead (#3663). - Remove unused
owner
attribute for client Dataset data model (#3665)
- Fixed PostgreSQL database not being updated after
begin_nested
because of missingcommit
(#3567).
- Fixed
settings
could not be provided when updating arating
orranking
question (#3552).
- Added
PATCH /api/v1/fields/{field_id}
endpoint to update the field title and markdown settings (#3421). - Added
PATCH /api/v1/datasets/{dataset_id}
endpoint to update dataset name and guidelines (#3402). - Added
PATCH /api/v1/questions/{question_id}
endpoint to update question title, description and some settings (depending on the type of question) (#3477). - Added
DELETE /api/v1/records/{record_id}
endpoint to remove a record given its ID (#3337). - Added
pull
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) to pull all the records from it and return it as a local copy as aFeedbackDataset
(#3465). - Added
delete
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) (#3512). - Added
delete_records
method inRemoteFeedbackDataset
, anddelete
method inRemoteFeedbackRecord
to delete records from Argilla (#3526).
- Improved efficiency of weak labeling when dataset contains vectors (#3444).
- Added
ArgillaDatasetMixin
to detach the Argilla-related functionality from theFeedbackDataset
(#3427) - Moved
FeedbackDataset
-relatedpydantic.BaseModel
schemas toargilla.client.feedback.schemas
instead, to be better structured and more scalable and maintainable (#3427) - Update CLI to use database async connection (#3450).
- Limit rating questions values to the positive range [1, 10] (#3451).
- Updated
POST /api/users
endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated Python client
User.create
method to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated
GET /api/v1/me/datasets/{dataset_id}/records
endpoint to allow getting records matching one of the response statuses provided via query param (#3359). - Updated
POST /api/v1/me/datasets/{dataset_id}/records
endpoint to allow searching records matching one of the response statuses provided via query param (#3359). - Updated
SearchEngine.search
method to allow searching records matching one of the response statuses provided (#3359). - After calling
FeedbackDataset.push_to_argilla
, the methodsFeedbackDataset.add_records
andFeedbackRecord.set_suggestions
will automatically call Argilla with no need of callingpush_to_argilla
explicitly (#3465). - Now calling
FeedbackDataset.push_to_huggingface
dumps theresponses
as aList[Dict[str, Any]]
instead ofSequence
to make it more readable via 🤗datasets
(#3539).
- Fixed issue with
bool
values anddefault
from Jinja2 while generating the HuggingFaceDatasetCard
fromargilla_template.md
(#3499). - Fixed
DatasetConfig.from_yaml
which was failing when callingFeedbackDataset.from_huggingface
as the UUIDs cannot be deserialized automatically byPyYAML
, so UUIDs are neither dumped nor loaded anymore (#3502). - Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
TextClassificationSettings
andTokenClassificationSettings
labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).- Fixed
PUT /api/v1/datasets/{dataset_id}/publish
to check whether at least one field and question hasrequired=True
(#3511). - Fixed
FeedbackDataset.from_huggingface
assuggestions
were being lost when there were noresponses
(#3539). - Fixed
QuestionSchema
andFieldSchema
not validatingname
attribute (#3550).
- After calling
FeedbackDataset.push_to_argilla
, callingpush_to_argilla
again won't do anything since the dataset is already pushed to Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, callingfetch_records
won't do anything since the records are lazily fetched from Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, the Argilla ID is no longer stored in the attribute/propertyargilla_id
but inid
instead (#3465).
- Fixed
ModuleNotFoundError
caused because theargilla.utils.telemetry
module used in theArgillaTrainer
was importing an optional dependency not installed by default (#3471). - Fixed
ImportError
caused because theargilla.client.feedback.config
module was importingpyyaml
optional dependency not installed by default (#3471).
- The
suggestion_type_enum
ENUM data type created in PostgreSQL didn't have any value (#3445).
- Fix database migration for PostgreSQL (See #3438)
- Added
GET /api/v1/users/{user_id}/workspaces
endpoint to list the workspaces to which a user belongs (#3308 and #3343). - Added
HuggingFaceDatasetMixin
for internal usage, to detach theFeedbackDataset
integrations from the class itself, and use Mixins instead (#3326). - Added
GET /api/v1/records/{record_id}/suggestions
API endpoint to get the list of suggestions for the responses associated to a record (#3304). - Added
POST /api/v1/records/{record_id}/suggestions
API endpoint to create a suggestion for a response associated to a record (#3304). - Added support for
RankingQuestionStrategy
,RankingQuestionUnification
and the.for_text_classification
method for theTrainingTaskMapping
(#3364) - Added
PUT /api/v1/records/{record_id}/suggestions
API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391). - Added
suggestions
attribute toFeedbackRecord
, and allow adding and retrieving suggestions from the Python client (#3370) - Added
allowed_for_roles
Python decorator to check whether the current user has the required role to access the decorated function/method forUser
andWorkspace
(#3383) - Added API and Python Client support for workspace deletion (Closes #3260)
- Added
GET /api/v1/me/workspaces
endpoint to list the workspaces of the current active user (#3390)
- Updated output payload for
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
,POST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to include the suggestions of the records based on the value of theinclude
query parameter (#3304). - Updated
POST /api/v1/datasets/{dataset_id}/records
input payload to add suggestions (#3304). - The
POST /api/datasets/:dataset-id/:task/bulk
endpoints don't create the dataset if does not exists (Closes #3244) - Added Telemetry support for
ArgillaTrainer
(closes #3325) User.workspaces
is no longer an attribute but a property, and is callinglist_user_workspaces
to list all the workspace names for a given user ID (#3334)- Renamed
FeedbackDatasetConfig
toDatasetConfig
and export/import from YAML as default instead of JSON (just used internally onpush_to_huggingface
andfrom_huggingface
methods ofFeedbackDataset
) (#3326). - The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
- Updated
Dockerfile
parent image frompython:3.9.16-slim
topython:3.10.12-slim
(#3425). - Updated
quickstart.Dockerfile
parent image fromelasticsearch:8.5.3
toargilla/argilla-server:${ARGILLA_VERSION}
(#3425).
- Removed support to non-prefixed environment variables. All valid env vars start with
ARGILLA_
(See #3392).
- Fixed
GET /api/v1/me/datasets/{dataset_id}/records
endpoint returning always the responses for the records even ifresponses
was not provided via theinclude
query parameter (#3304). - Values for protected metadata fields are not truncated (Closes #3331).
- Big number ids are properly rendered in UI (Closes #3265)
- Fixed
ArgillaDatasetCard
to include the values/labels for all the existing questions (#3366)
- Integer support for record id in text classification, token classification and text2text datasets.
- Using
rg.init
with defaultargilla
user skips setting the default workspace if not available. (Closes #3340) - Resolved wrong import structure for
ArgillaTrainer
andTrainingTaskMapping
(Closes #3345) - Pin pydantic dependency to version < 2 (Closes 3348)
- Added
RankingQuestionSettings
class allowing to create ranking questions in the API usingPOST /api/v1/datasets/{dataset_id}/questions
endpoint (#3232) - Added
RankingQuestion
in the Python client to create ranking questions (#3275). - Added
Ranking
component in feedback task question form (#3177 & #3246). - Added
FeedbackDataset.prepare_for_training
method for generaring a framework-specific dataset with the responses provided forRatingQuestion
,LabelQuestion
andMultiLabelQuestion
(#3151). - Added
ArgillaSpaCyTransformersTrainer
class for supporting the training withspacy-transformers
(#3256).
- Added instructions for how to run the Argilla frontend in the developer docs (#3314).
- All docker related files have been moved into the
docker
folder (#3053). release.Dockerfile
have been renamed toDockerfile
(#3133).- Updated
rg.load
function to raise aValueError
with a explanatory message for the cases in which the user tries to use the function to load aFeedbackDataset
(#3289). - Updated
ArgillaSpaCyTrainer
to allow re-usingtok2vec
(#3256).
- Check available workspaces on Argilla on
rg.set_workspace
(Closes #3262)
- Replaced
np.float
alias byfloat
to avoidAttributeError
when usingfind_label_errors
function withnumpy>=1.24.0
(#3214). - Fixed
format_as("datasets")
when no responses or optional respones inFeedbackRecord
, to set their value to what 🤗 Datasets expects instead of justNone
(#3224). - Fixed
push_to_huggingface()
whengenerate_card=True
(default behaviour), as we were passing a sample record to theArgillaDatasetCard
class, andUUID
s introduced in 1.10.0 (#3192), are not JSON-serializable (#3231). - Fixed
from_argilla
andpush_to_argilla
to ensure consistency on both field and question re-construction, and to ensureUUID
s are properly serialized asstr
, respectively (#3234). - Refactored usage of
import argilla as rg
to clarify package navigation (#3279).
- Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
- Fixed library buttons' formatting on Tutorials page (#3255).
- Modified styling of error code outputs in notebooks (#3270).
- Added ElasticSearch and OpenSearch versions (#3280).
- Removed template notebook from table of contents (#3271).
- Fixed tutorials with
pip install argilla
to not use older versions of the package (#3282).
- Added
metadata
attribute to theRecord
of theFeedbackDataset
(#3194) - New
users update
command to update the role for an existing user (#3188) - New
Workspace
class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180) - Added
User
class to let users manage their Argilla users via the Python client (#3169). - Added an option to display
tqdm
progress bar toFeedbackDataset.push_to_argilla
when looping over the records to upload (#3233).
- The role system now support three different roles
owner
,admin
andannotator
(#3104) admin
role is scoped to workspace-level operations (#3115)- The
owner
user is created among the default pool of users in the quickstart, and the default user in the server has nowowner
role (#3248), reverting (#3188).
- As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
- Added search component for feedback datasets (#3138)
- Added markdown support for feedback dataset guidelines (#3153)
- Added Train button for feedback datasets (#3170)
- Updated
SearchEngine
andPOST /api/v1/me/datasets/{dataset_id}/records/search
to return thetotal
number of records matching the search query (#3166)
- Replaced Enum for string value in URLs for client API calls (Closes #3149)
- Resolve breaking issue with
ArgillaSpanMarkerTrainer
for Named Entity Recognition withspan_marker
v1.1.x onwards. - Move
ArgillaDatasetCard
import under@requires_version
decorator, so that theImportError
onhuggingface_hub
is handled properly (#3174) - Allow flow
FeedbackDataset.from_argilla
->FeedbackDataset.push_to_argilla
under different dataset names and/or workspaces (#3192)
- Added boolean
use_markdown
property toTextFieldSettings
model. - Added boolean
use_markdown
property toTextQuestionSettings
model. - Added new status
draft
for theResponse
model. - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API (#3005) - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API (#3010). - Added
POST /api/v1/me/datasets/{dataset_id}/records/search
endpoint (#3068). - Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
- Added docstrings to the
pydantic.BaseModel
s defined atargilla/client/feedback/schemas.py
(#3137) - Added the information about executing tests in the developer documentation ([#3143]).
- Updated
GET /api/v1/me/datasets/:dataset_id/metrics
output payload to include the count of responses withdraft
status. - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API. - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API. - Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
- Updated
alembic
setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044) - Improved
DatasetCard
generation onFeedbackDataset.push_to_huggingface
whengenerate_card=True
, following the official HuggingFace Hub template, but suited toFeedbackDataset
s from Argilla (#3110)
- Disallow
fields
andquestions
inFeedbackDataset
with the same name (#3126). - Fixed broken links in the documentation and updated the development branch name from
development
todevelop
([#3145]).
/api/v1/datasets
new endpoint to list and create datasets (#2615)./api/v1/datasets/{dataset_id}
new endpoint to get and delete datasets (#2615)./api/v1/datasets/{dataset_id}/publish
new endpoint to publish a dataset (#2615)./api/v1/datasets/{dataset_id}/questions
new endpoint to list and create dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields
new endpoint to list and create dataset fields (#2615)/api/v1/datasets/{dataset_id}/questions/{question_id}
new endpoint to delete a dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields/{field_id}
new endpoint to delete a dataset field (#2615)/api/v1/workspaces/{workspace_id}
new endpoint to get workspaces by id (#2615)/api/v1/responses/{response_id}
new endpoint to update and delete a response (#2615)/api/v1/datasets/{dataset_id}/records
new endpoint to create and list dataset records (#2615)/api/v1/me/datasets
new endpoint to list user visible datasets (#2615)/api/v1/me/dataset/{dataset_id}/records
new endpoint to list dataset records with user responses (#2615)/api/v1/me/datasets/{dataset_id}/metrics
new endpoint to get the dataset user metrics (#2615)/api/v1/me/records/{record_id}/responses
new endpoint to create record user responses (#2615)- showing new feedback task datasets in datasets list ([#2719])
- new page for feedback task ([#2680])
- show feedback task metrics ([#2822])
- user can delete dataset in dataset settings page ([#2792])
- Support for
FeedbackDataset
in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003]) - Integration with the HuggingFace Hub ([#2949])
- Added
ArgillaPeftTrainer
for text and token classificaiton #2854 - Added
predict_proba()
method toArgillaSetFitTrainer
- Added
ArgillaAutoTrainTrainer
for Text Classification #2664 - New
database revisions
command showing database revisions info
- Avoid rendering html for invalid html strings in Text2text ([#2911]argilla-io#2911)
- The
database migrate
command accepts a--revision
param to provide specific revision id tokens_length
metrics function returns empty data (#3045)token_length
metrics function returns empty data (#3045)mention_length
metrics function returns empty data (#3045)entity_density
metrics function returns empty data (#3045)
- Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
tokens_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)token_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_density
metrics function has been deprecated and will be removed in 1.10.0 (#3045)
- Removed mention
density
,tokens_length
andchars_length
metrics from token classification metrics storage (#3045) - Removed token
char_start
,char_end
,tag
, andscore
metrics from token classification metrics storage (#3045) - Removed tags-related metrics from token classification metrics storage (#3045)
- add
max_retries
andnum_threads
parameters torg.log
to run data logging request concurrently with backoff retry policy. See #2458 and #2533 rg.load
acceptsinclude_vectors
andinclude_metrics
when loading data. Closes #2398- Added
settings
param toprepare_for_training
(#2689) - Added
prepare_for_training
foropenai
(#2658) - Added
ArgillaOpenAITrainer
(#2659) - Added
ArgillaSpanMarkerTrainer
for Named Entity Recognition (#2693) - Added
ArgillaTrainer
CLI support. Closes (#2809)
- fix image alignment on token classification
- Argilla quickstart image dependencies are externalized into
quickstart.requirements.txt
. See #2666 - bulk endpoints will upsert data when record
id
is present. Closes #2535 - moved from
click
totyper
CLI support. Closes (#2815) - Argilla server docker image is built with PostgreSQL support. Closes #2686
- The
rg.log
computes all batches and raise an error for all failed batches. - The default batch size for
rg.log
is now 100.
argilla.training
bugfixes and unification (#2665)- Resolved several small bugs in the
ArgillaTrainer
.
- The
rg.log_async
function is deprecated and will be removed in next minor release.
ARGILLA_HOME_PATH
new environment variable (#2564).ARGILLA_DATABASE_URL
new environment variable (#2564).- Basic support for user roles with
admin
andannotator
(#2564). id
,first_name
,last_name
,role
,inserted_at
andupdated_at
new user fields (#2564)./api/users
new endpoint to list and create users (#2564)./api/users/{user_id}
new endpoint to delete users (#2564)./api/workspaces
new endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/users
new endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}
new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migrate
new task to migrate users from old YAML file to database (#2564).argilla.tasks.users.create
new task to create a user (#2564).argilla.tasks.users.create_default
new task to create a user with default credentials (#2564).argilla.tasks.database.migrate
new task to execute database migrations (#2564).release.Dockerfile
andquickstart.Dockerfile
now creates a defaultargilladata
volume to persist data (#2564).- Add user settings page. Closes #2496
- Added
Argilla.training
module with support forspacy
,setfit
, andtransformers
. Closes #2504
- Now the
prepare_for_training
method is working whenmulti_label=True
. Closes #2606
ARGILLA_USERS_DB_FILE
environment variable now it's only used to migrate users from YAML file to database (#2564).full_name
user field is now deprecated andfirst_name
andlast_name
should be used instead (#2564).password
user field now requires a minimum of8
and a maximum of100
characters in size (#2564).quickstart.Dockerfile
image default users fromteam
andargilla
toadmin
andannotator
including new passwords and API keys (#2564).- Datasets to be managed only by users with
admin
role (#2564). - The list of rules is now accessible while metrics are computed. Closes#2117
- Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
email
user field (#2564).disabled
user field (#2564).- Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY
andARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD
environment variables. Usepython -m argilla.tasks.users.create_default
instead (#2564).- The old headers for
API Key
andworkspace
from python client - The default value for old
API Key
constant. Closes #2251
1.5.1 - 2023-03-30
- Copying datasets between workspaces with proper owner/workspace info. Closes #2562
- Copy dataset with empty workspace to the default user workspace 905d4de
- Using elasticsearch config to request backend version. Closes #2311
- Remove sorting by score in labels. Closes #2622
- Update field name in metadata for image url. See #2609
- Improvements in tutorial doc cards. Closes #2216
1.5.0 - 2023-03-21
- Add the fields to retrieve when loading the data from argilla.
rg.load
takes too long because of the vector field, even when users don't need it. Closes #2398 - Add new page and components for dataset settings. Closes #2442
- Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
- Non-searchable fields support in metadata. #2570
- Add record ID references to the prepare for training methods. Closes #2483
- Add tutorial on Image Classification. #2420
- Add Train button, visible for "admin" role, with code snippets from a selection of libraries. Closes [#2591] (argilla-io#2591)
- Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see argilla-io#2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default. #2581
- The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
- Allow URL to be clickable in Jupyter notebook again. Closes #2527
- Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client
<v1.3.0
- Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version
<1.3.0
- Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.