This repository belongs to the Bachelor's thesis titled "Integrating Supertag Features into Neural Discontinuous Constituent Parsing". It can be downloaded here: https://mielczarek.de/?page_id=725
Assuming you have a Python 3.10.9 environment.
Install disco-dop (https://github.com/andreasvc/disco-dop)
Install depCCG (2.0.3.2): (https://github.com/masashi-y/depccg) Note that the models (basic and rebank) currently need to be downloaded manually (see masashi-y/depccg#10). The links are in the README.md
After that run:
pip install -r requirements.txt
DPTB:
git clone https://github.com/mcoavoux/multilingual_disco_data.git
and follow instructions in multilingual_disco_data/readme
You need to provide a copy of the DPTB (contact Kilian Evang).
CCGrebank:
Place unzipped CCGrebank
folder into the project
directory. Path can be changed via -ccg
parameter
when launching sfparser.py
.
DepPTB:
Place unzipped DepPTB
folder into project directory.
Path can be changed via -depptb
parameter when
launching sfparser.py
.
LCFRS:
Convert dptb7.export
via LCFRS supertagger (https://github.com/truprecht/lcfrs-supertagger).
Extract contents of .corpus-cache/dptb.3914-39832-1700-2416.bin-0-1-r.Vanilla.Vanilla.tar.gz
into a folder named LCFRS
in the project directory.
Path can be changed via -lcfrs
parameter when
launching sfparser.py
.
LTAG-spinal:
Download LTAG-spinal treebank from https://www.cis.upenn.edu/~xtag/spinal/ and place
contents into a folder named LTAGspinal
in the project directory.
Path can be changed via -LTAGspinal
parameter when
launching sfparser.py
.
Due to github's file size limitations, the models are stored externally. You can download them at https://drive.google.com/drive/folders/1Gqyq9H1TihH5D5Z0WWKHeQeDse-3wSfI?usp=sharing
I recommend using gdown
to download from Google Drive: https://github.com/wkentaro/gdown
Each model has its own launch_experiments_<model>.sh script. In the script, comment out the line
starting with python sfparser.py train
. Then run in terminal:
sh launch_experiments_<model>.sh "dptb" "../pretrained_models/<model>" <device>
Use -1 for CPU and 0 for GPU (assuming you have only one GPU).
Each model has its own launch_experiments_<model>.sh script. Run in terminal:
sh launch_experiments_<model>.sh "dptb" "../pretrained_models/<model> <device>
Use -1 for CPU and 0 for GPU (assuming you have only one GPU).
This project extends the discoparset project created by
Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle (Coavoux & Cohen, NAACL 2019)
Their code is available at https://gitlab.com/mcoavoux/discoparset