Skip to content

Statistical Parsing for Tree Wrapping Grammars with Transformer-based supertagging and A-star parsing. The experiments are described in the submitted LREC 2022 paper "RRGparbank: A Parallel Role and Reference Grammar Treebank"

License

Notifications You must be signed in to change notification settings

filemon11/Transformer-based-TWG-parsing

 
 

Repository files navigation

Transformer-based-TWG-parsing

Statistical Parsing for Tree Wrapping Grammars with Transformer-based supertagging and A-star parsing

This the repository for the experiments for the LREC 2022 submission with the title "RRGparbank: A Parallel Role and Reference Grammar Treebank"

Installation

Install ParTAGe-TWG.

Also install the packages from the requirements.txt file.

The code works with the Python version 3.9

Download language model

Here is the list of language models described in LREC paper:

Use downloaded model

Unzip the downloaded model and rename the folder with the unzipped model to "best_model".

Parse sentences

Parse a file with sentences using the file parse_twg.

It takes two arguments - input file with plain sentences and output file.

Please take a look at the example input and output files:

python parse_twg.py example_input_file.txt example_output_file.txt

The output format of the output file is discbracket (discontinuous bracket trees). Read more about this format here.

Please note that for the French model you need to rename the model name from "bert" to "camembert":

language_model = NERModel(
    "bert", "best_model", use_cuda=device # for French, replace "bert" with "camembert"
)

To use DistilBERT model, rename the model name from "bert" to "distilbert":

language_model = NERModel(
    "distilbert", "best_model", use_cuda=device 
)

About

Statistical Parsing for Tree Wrapping Grammars with Transformer-based supertagging and A-star parsing. The experiments are described in the submitted LREC 2022 paper "RRGparbank: A Parallel Role and Reference Grammar Treebank"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 62.6%
  • Cython 35.5%
  • C++ 1.4%
  • C 0.5%