Skip to content

Tutorials

hankcs edited this page Jun 21, 2019 · 4 revisions

Tutorials

Install

pip install elit
pip3 uninstall mxnet-cu92 -y # Shall we rely on mxnet-cu92 instead of mxnet?
pip3 uninstall mxnet -y
pip3 install mxnet-cu92

The current version (elit 0.1.32) works only on GPU, since the embedding interface gluonnlp.embedding.create throws mxnet.base.MXNetError: [11:33:12] src/storage/storage.cc:143: Compile with USE_CUDA=1 to enable GPU usage on CPU.

Demos

All demos are in elit/tests/demo. We have 5 components released in this version, which are EnglishTokenizer, POSTagger, NERTagger, DependencyParser and SDPParser. Except for the rule-based EnglishTokenizer, every component needs to be loaded in a similar way, see following example.

pos_tagger = POSTagger()
pos_tagger.load()

The load method downloads pretrained models from our S3 bucket (elitcloud-public-data) automatically.

Training

Training API shares similar train methods between components. However, each component has its unique parameters and requires unique Document field.

POS

        tagger = POSFlairTagger()
        trn_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'pos'})
        dev_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'pos'})
        model_path = './posmodel'
        tagger.train(trn_docs=trn_docs, dev_docs=dev_docs, model_path=model_path,
                     pretrained_embeddings=('fasttext', 'crawl-300d-2M-subword'))

Write your own conll_to_documents if you have other formats else than .tsv.

  • To activate Flair embeddings, pass in forward_language_model and backward_language_model.
  • If GPU memory exhausted, pass in embeddings_in_memory=False.

NER

Almost the same with POS, except that you need to take care of the conll_to_documents.

trn_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'ner'})

DEP

Load your corpus to the dep field of Document structure, then run the following code.

parser = BiaffineDepParser()
parser.train(trn_docs=trn_docs,
             dev_docs=dev_docs,
             save_dir=save_dir,
             pretrained_embeddings=('fasttext', 'crawl-300d-2M-subword'), word_dims=300)

SDP

Similar to DEP, except that semantic relations are loaded to the 'sdp' field of Document structure.

Clone this wiki locally