Skip to content

Commit

Permalink
#87: Updated user guide for added error message column (#89)
Browse files Browse the repository at this point in the history
* Updated changelog

* Updated user guide

* Apply suggestions from code review

Co-authored-by: Christoph Kuhnke <[email protected]>

* Updated text in user guide

---------

Co-authored-by: Christoph Kuhnke <[email protected]>
  • Loading branch information
umitbuyuksahin and ckunki authored Mar 30, 2023
1 parent 2446e1e commit 71c6611
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 39 deletions.
15 changes: 12 additions & 3 deletions doc/changes/changes_0.4.0.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# Transformers Extension 0.4.0, released YYYY-MM-DD
# Transformers Extension 0.4.0, released 2023-03-31

Code name: TBD
Code name: Added Zero-Shot model and error handling structure


## Summary

TBD
This release introduces a new UDF script for Zero-Shot text classification.
Moreover, this version enables users to use custom models located in local
filesystem or private repositories. In addition, this release includes an error
handling mechanism to handle errors that may occur during model loading or
one of the prediction stages.

### Features

Expand All @@ -14,6 +18,11 @@ TBD
- #47: Added rank column to model results returning top-k predictions
- #72: Added authentication token to download private models
- #64: Added Zero-Shot test classification
- #25: Added error handling structure

### Documentation

- #87: Updated User Guide with error_message column



Expand Down
89 changes: 53 additions & 36 deletions doc/user_guide/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,12 +273,14 @@ SELECT TE_SEQUENCE_CLASSIFICATION_SINGLE_TEXT_UDF(

The inference results are presented with predicted _LABEL_ and confidence
_SCORE_ columns, combined with the inputs used when calling
this UDF. For example:
this UDF. In case of any error during model loading or prediction, these new
columns are set to `null` and column _ERROR_MESSAGE_ is set
to the stacktrace of the error. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | LABEL | SCORE |
| ------------- | ------- | ---------- | --------- |---------| ----- |
| conn_name | dir/ | model_name | text | label_1 | 0.75 |
| ... | ... | ... | ... | ... | ... |
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | LABEL | SCORE | ERROR_MESSAGE |
| ------------- | ------- | ---------- | --------- |---------| ----- |----------------|
| conn_name | dir/ | model_name | text | label_1 | 0.75 | None |
| ... | ... | ... | ... | ... | ... | ... |


### Sequence Classification for Text Pair UDF
Expand All @@ -305,7 +307,10 @@ SELECT TE_SEQUENCE_CLASSIFICATION_TEXT_PAIR_UDF(
- ```second_text```: The second input text

The inference results are presented with predicted _LABEL_ and confidence
_SCORE_ columns, combined with the inputs used when calling this UDF.
_SCORE_ columns, combined with the inputs used when calling this UDF.
In case of any error during model loading or prediction, these new
columns are set to `null` and column _ERROR_MESSAGE_ is set
to the stacktrace of the error.


### Question Answering UDF
Expand Down Expand Up @@ -337,13 +342,15 @@ in the context, it might return less than `top_k` answers (see the [top_k parame

The inference results are presented with predicted _ANSWER_, confidence
_SCORE_, and _RANK_ columns, combined with the inputs used when calling this UDF.
If `top_k` > 1, each input row is repeated for each answer. For example:
If `top_k` > 1, each input row is repeated for each answer. In case of any error
during model loading or prediction, these new columns are set to `null` and column _ERROR_MESSAGE_ is set
to the stacktrace of the error. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | QUESTION | CONTEXT | TOP_K | ANSWER | SCORE | RANK |
| ------------- | ------- | ---------- |------------|-----------| ----- |----------| ----- |------|
| conn_name | dir/ | model_name | question_1 | context_1 | 2 | answer_1 | 0.75 | 1 |
| conn_name | dir/ | model_name | question_2 | context_1 | 2 | answer_2 | 0.70 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... | .. |
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | QUESTION | CONTEXT | TOP_K | ANSWER | SCORE | RANK | ERROR_MESSAGE |
| ------------- | ------- | ---------- |------------|-----------| ----- |----------| ----- |------| ------------- |
| conn_name | dir/ | model_name | question_1 | context_1 | 2 | answer_1 | 0.75 | 1 | None |
| conn_name | dir/ | model_name | question_2 | context_1 | 2 | answer_2 | 0.70 | 2 | None |
| ... | ... | ... | ... | ... | ... | ... | ... | .. | ... |


### Masked Language Modelling UDF
Expand Down Expand Up @@ -374,13 +381,15 @@ SELECT TE_FILLING_MASK_UDF(

The inference results are presented with _FILLED_TEXT_, confidence
_SCORE_, and _RANK_ columns, combined with the inputs used when calling this UDF.
If `top_k` > 1, each input row is repeated for each prediction. For example:
If `top_k` > 1, each input row is repeated for each prediction. In case of any
error during model loading or prediction, these new columns are set to `null`
and column _ERROR_MESSAGE_ is set to the stacktrace of the error. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | TOP_K | FILLED_TEXT | SCORE | RANK |
| ------------- | ------- | ---------- |---------------| ----- |---------------| ----- |------|
| conn_name | dir/ | model_name | text `<mask>` | 2 | text filled_1 | 0.75 | 1 |
| conn_name | dir/ | model_name | text `<mask>` | 2 | text filled_2 | 0.70 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | TOP_K | FILLED_TEXT | SCORE | RANK | ERROR_MESSAGE |
| ------------- | ------- | ---------- |---------------| ----- |---------------| ----- |------|---------------|
| conn_name | dir/ | model_name | text `<mask>` | 2 | text filled_1 | 0.75 | 1 | None |
| conn_name | dir/ | model_name | text `<mask>` | 2 | text filled_2 | 0.70 | 2 | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |


### Text Generation UDF
Expand Down Expand Up @@ -411,7 +420,9 @@ SELECT TE_TEXT_GENERATION_UDF(
- ```return_full_text```: If set to False only added text is returned, otherwise the full text is returned.

The inference results are presented with _GENERATED_TEXT_ column,
combined with the inputs used when calling this UDF.
combined with the inputs used when calling this UDF. In case of any error during
model loading or prediction, these new columns are set to `null`, and you can
see the stacktrace of the error in the _ERROR_MESSAGE_ column.


### Token Classification UDF
Expand Down Expand Up @@ -446,12 +457,14 @@ SELECT TE_TOKEN_CLASSIFICATION_UDF(
The inference results are presented with _START_POS_ indicating the index of the starting character of the token,
_END_POS_ indicating the index of the ending character of the token, _WORD_ indicating the token, predicted _ENTITY_, and
confidence _SCORE_ columns, combined with the inputs used when calling this UDF.
For example:
In case of any error during model loading or prediction, these new
columns are set to `null`, and column _ERROR_MESSAGE_ is set
to the stacktrace of the error. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | AGGREGATION_STRATEGY | START_POS | END_POS | WORD | ENTITY | SCORE |
| ------------- | ------- | ---------- |-----------|----------------------|-----------|---------|------|--------|-------|
| conn_name | dir/ | model_name | text | simple | 0 | 4 | text | noun | 0.75 |
| ... | ... | ... | ... | ... | ... | ... | ... | .. | ... |
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | AGGREGATION_STRATEGY | START_POS | END_POS | WORD | ENTITY | SCORE | ERROR_MESSAGE |
| ------------- | ------- | ---------- |-----------|----------------------|-----------|---------|------|--------|-------| ------------- |
| conn_name | dir/ | model_name | text | simple | 0 | 4 | text | noun | 0.75 | None |
| ... | ... | ... | ... | ... | ... | ... | ... | .. | ... | ... |



Expand Down Expand Up @@ -486,12 +499,14 @@ SELECT TE_TRANSLATION_UDF(
- ```max_length```: The maximum total length of the translated text.

The inference results are presented with _TRANSLATION_TEXT_ column,
combined with the inputs used when calling this UDF. For example:
combined with the inputs used when calling this UDF. In case of any error during
model loading or prediction, these new columns are set to `null`, and
column _ERROR_MESSAGE_ is set to the stacktrace of the error. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | SOURCE_LANGUAGE | TARGET_LANGUAGE | MAX_LENGTH | TRANSLATION_TEXT |
| ------------- | ------- | ---------- |-----------|-----------------|-----------------|------------| ---------------- |
| conn_name | dir/ | model_name | context | English | German | 100 | kontext |
| ... | ... | ... | ... | ... | ... | ... | ... |
| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | SOURCE_LANGUAGE | TARGET_LANGUAGE | MAX_LENGTH | TRANSLATION_TEXT | ERROR_MESSAGE |
| ------------- | ------- | ---------- |-----------|-----------------|-----------------|------------| ---------------- |---------------|
| conn_name | dir/ | model_name | context | English | German | 100 | kontext | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |


### Zero-Shot Text Classification UDF
Expand Down Expand Up @@ -522,10 +537,12 @@ SELECT TE_ZERO_SHOT_TEXT_CLASSIFICATION_UDF(
should be comma-separated, e.g., `label1,label2,label3`.

The inference results are presented with predicted _LABEL_, _SCORE_ and _RANK_
columns, combined with the inputs used when calling this UDF. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | CANDIDATE LABELS | LABEL | SCORE | RANK |
| ------------- | ------- | ---------- |-----------|------------------|--------|-------|------|
| conn_name | dir/ | model_name | text | label1,label2.. | label1 | 0.75 | 1 |
| conn_name | dir/ | model_name | text | label1,label2.. | label2 | 0.70 | 2 |
| ... | ... | ... | ... | ... | ... | ... | .. |
columns, combined with the inputs used when calling this UDF. In case of any
error during model loading or prediction, these new columns are set to `null`,
and column _ERROR_MESSAGE_ is set to the stacktrace of the error. For example:

| BUCKETFS_CONN | SUB_DIR | MODEL_NAME | TEXT_DATA | CANDIDATE LABELS | LABEL | SCORE | RANK | ERROR_MESSAGE |
| ------------- | ------- | ---------- |-----------|------------------|--------|-------|------|---------------|
| conn_name | dir/ | model_name | text | label1,label2.. | label1 | 0.75 | 1 | None |
| conn_name | dir/ | model_name | text | label1,label2.. | label2 | 0.70 | 2 | None |
| ... | ... | ... | ... | ... | ... | ... | .. | ... |

0 comments on commit 71c6611

Please sign in to comment.