Applying custom Huggingface model as an encoder #2057

efima-ai · 2022-05-25T10:25:21Z

efima-ai
May 25, 2022

Hi,

The documentation says

The available encoders include encoders used for Sequence Features as well as encoders from the huggingface transformers library: bert, gpt, gpt2, xlnet, xlm, roberta, distilbert, ctrl, camembert, albert, t5, xlmroberta, flaubert, electra, longformer and auto-transformer.

Does this mean that Ludwig does not support a model that is found on Huggingface but not listed above, e.g. https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1, as an encoder for the text features?

Thanks 😊

Answered by jppgks

May 25, 2022

Hi @efima-ai, you can use arbitrary HuggingFace models using the auto_transformer encoder. You can try something like this for your case:

import pandas as pd
import yaml

from ludwig.api import LudwigModel

config = """
input_features:
    - name: text
      type: text
      encoder: auto_transformer
      pretrained_model_name_or_path: 'TurkuNLP/bert-base-finnish-cased-v1'
output_features:
    - name: category
      type: category
trainer:
    epochs: 1
"""
model = LudwigModel(yaml.load(config), backend="local")

df = pd.DataFrame(
    {
        "text": ["Suomessa vaihtuu kesän aikana sekä pääministeri että valtiovarain"],
        "category": ["Suomi"],
    }
)
model.train(df)
model.predict

View full answer

jppgks · 2022-05-25T19:36:34Z

jppgks
May 25, 2022
Collaborator

Hi @efima-ai, you can use arbitrary HuggingFace models using the auto_transformer encoder. You can try something like this for your case:

import pandas as pd
import yaml

from ludwig.api import LudwigModel

config = """
input_features:
    - name: text
      type: text
      encoder: auto_transformer
      pretrained_model_name_or_path: 'TurkuNLP/bert-base-finnish-cased-v1'
output_features:
    - name: category
      type: category
trainer:
    epochs: 1
"""
model = LudwigModel(yaml.load(config), backend="local")

df = pd.DataFrame(
    {
        "text": ["Suomessa vaihtuu kesän aikana sekä pääministeri että valtiovarain"],
        "category": ["Suomi"],
    }
)
model.train(df)
model.predict(df)

1 reply

efima-ai Jun 7, 2022
Author

Thanks Joppe. Finally got time to try this out - and it works fantastically!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying custom Huggingface model as an encoder #2057

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Applying custom Huggingface model as an encoder #2057

efima-ai May 25, 2022

Replies: 1 comment · 1 reply

jppgks May 25, 2022 Collaborator

efima-ai Jun 7, 2022 Author

efima-ai
May 25, 2022

Replies: 1 comment 1 reply

jppgks
May 25, 2022
Collaborator

efima-ai Jun 7, 2022
Author