[ANLP] Team Billie-Newman at SemEval-2023 Task 5: Clickbait Spoiling

This is the repository for the course 'Advanced Natural Language Processing' for the study 'Digital Sciences' at the University of Applied Sciences Cologne.
It contains the project code for the participation in the Clickbait Challenge proposed at SemEval-2023

Task 1 Spoiler Classification: RoBERTa model with NER and custom components
Task 2 Spoiler Generation: RoBERTa SQuAD2.0 model with rule-based approach
Dataset: Webis Clickbait Spoiling Corpus 2022

Structure of this repository

doc\: Contains the project presentation and project report
task1_anlp_deploy\: Code and Docker File of Task 1
task2_anlp_deploy\: Code and Docker File of Task 2

File description

filename	description
`EDA.ipynb`	Code for pre-processing the WEBIS Clickbait Spoiling Corpus 2022
`simple_transformer_task1.ipynb`	Code for training the RoBERTa model for multi-class classification
`run_task_1.py`	File for running the spoiler classifcation
`Reformat_to_SQuAD.ipynb`	Code for reformatting the spoiler questions into the SQuaD2.0 format
`Training_model.ipynb`	Code for training the RoBERTa-SQuAD2.0 model for the downstream task for spoiler generation
`run_task_2.py`	File for running the spoiler generation. Arguments: --apply_rule_base v1 / --apply_rule_base v2

Docker Images

The docker images can be pulled from these dockerhub repositories:
[Task 1 Dockerhub Repo] | [Task 2 Dockerhub Repo]

Task 1 Command

docker run --rm -d >>>CONTAINER_NAME<<< --input >>INPUT_DATA<<<.jsonl --output output.jsonl --apply_ner=yes

Task 2 Command

Without rule-based approach:

docker run --rm -d >>>CONTAINER_NAME<<< --input >>INPUT_DATA<<<.jsonl --output output.jsonl --apply_rule_base=v1

With rule-based approach:

docker run --rm -d >>>CONTAINER_NAME<<< --input >>INPUT_DATA<<<.jsonl --output output.jsonl --apply_rule_base=v2

Language Models

Due to their size, the language models can be downloaded separately from these sciebo links:
[Task 1 Model] | [Task 2 Model]

Rename the respective root folder to saved_models and put it in either task1_anlp_deploy/saved_models or task2_anlp_deploy/saved_models respectively for usage.

Sources

SemEval-2023 Task 5
Webis Clickbait Spoiling Corpus 2022

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
doc		doc
task1_anlp_deploy		task1_anlp_deploy
task2_anlp_deploy		task2_anlp_deploy
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ANLP] Team Billie-Newman at SemEval-2023 Task 5: Clickbait Spoiling

Structure of this repository

File description

Docker Images

Task 1 Command

Task 2 Command

Language Models

Sources

About

Releases

Packages

Contributors 2

Languages

AH-Tran/DSC_ANLP

Folders and files

Latest commit

History

Repository files navigation

[ANLP] Team Billie-Newman at SemEval-2023 Task 5: Clickbait Spoiling

Structure of this repository

File description

Docker Images

Task 1 Command

Task 2 Command

Language Models

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages