Skip to content

rafaelsandroni/gpt3-data-labeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AutoLabeling

Want to label some data?

GPT3 data labeling is implemented on Metatext.ai, give it a try.

Goal

This repo aims to reproduce and improve the study from Want To Reduce Labeling Cost? GPT-3 Can Help in Brazilian Portuguese.

Abstract: Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.

Experiments

Fow now, the experiment was performed on sentiment analysis task using the B2W-Reviews Dataset just for test purposes. In the next steps we can explore benchmark tasks and datasets.

  • GPT3-Label
  • Human-Labeling
  • GPT3-Human-Labeling
  • RawGPT3

Task

Sentiment analysis on product reviews.

GPT3-Label performance evaluation

image

Library

This library aims to facilitate to add other language models in future experiments (e.g, GPT-J), and also to open for colaborations since we face low resources for Portuguese language.

pip install git+https://github.com/rafaelsandroni/autolabeling

Usage

from AutoLabeling import AutoLabeling

labeling = AutoLabeling(label_df, text_col="review_text", label_col="sentiment")

labeling.execute("nao gostei da qualidade do produto")

label_df["labeling"] = label_df["review_text"].apply(labeling.execute)

About

Data labeling using few shot learning GPT-3.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published