In repository contains the data related to our participation to the BabyLM 2024 competition.
This folder contains our preprocessing procedure. We decided to minimally clean the data source by removing any metalinguistic
You will find here an original tokenizer, dubbed MorPiece (MoP) freely inspired to the Tolerance Principle by Charles Yang. The current version (v.0.0.1) will be soon updated. We didn't use this tokenizer in the English experiments for lack of time, but this seems to us a useful contribution for rich morphological languages.
Here we include our original elaboration of some standard Recurrent Neural Network architectures (GRU and LSTM models). These models are loosely inspired by certain (non-standard) processing interpretations of Minimalist Grammars (e-MGs) with the specific intent of modeling various biases in training (using specific gate combinations) that mimic standard constraints operative in structure building, such as C-command and locality.
Results of the lm-eval campaign for BabyLM 2024 are included here in .json format (BLiMP task).
Chesi et al. 2024 - Different Ways to Forget: Linguistic Gates in Recurrent Neural Networks