BabyLM 2024 shared task @ NeTS

In repository contains the data related to our participation to the BabyLM 2024 competition.

01-preprocess

This folder contains our preprocessing procedure. We decided to minimally clean the data source by removing any metalinguistic

02-tokenization

You will find here an original tokenizer, dubbed MorPiece (MoP) freely inspired to the Tolerance Principle by Charles Yang. The current version (v.0.0.1) will be soon updated. We didn't use this tokenizer in the English experiments for lack of time, but this seems to us a useful contribution for rich morphological languages.

03-model_training

Here we include our original elaboration of some standard Recurrent Neural Network architectures (GRU and LSTM models). These models are loosely inspired by certain (non-standard) processing interpretations of Minimalist Grammars (e-MGs) with the specific intent of modeling various biases in training (using specific gate combinations) that mimic standard constraints operative in structure building, such as C-command and locality.

04-evaluation

Results of the lm-eval campaign for BabyLM 2024 are included here in .json format (BLiMP task).

REFERENCE

Chesi et al. 2024 - Different Ways to Forget: Linguistic Gates in Recurrent Neural Networks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BabyLM 2024 shared task @ NeTS

01-preprocess

02-tokenization

03-model_training

04-evaluation

REFERENCE

Files

README.md

Latest commit

History

README.md

File metadata and controls

BabyLM 2024 shared task @ NeTS

01-preprocess

02-tokenization

03-model_training

04-evaluation

REFERENCE