Tutorials

Install

pip install elit
pip3 uninstall mxnet-cu92 -y # Shall we rely on mxnet-cu92 instead of mxnet?
pip3 uninstall mxnet -y
pip3 install mxnet-cu92

The current version (elit 0.1.32) works only on GPU, since the embedding interface gluonnlp.embedding.create throws mxnet.base.MXNetError: [11:33:12] src/storage/storage.cc:143: Compile with USE_CUDA=1 to enable GPU usage on CPU.

Demos

All demos are in elit/tests/demo. We have 5 components released in this version, which are EnglishTokenizer, POSTagger, NERTagger, DependencyParser and SDPParser. Except for the rule-based EnglishTokenizer, every component needs to be loaded in a similar way, see following example.

pos_tagger = POSTagger()
pos_tagger.load()

The load method downloads pretrained models from our S3 bucket (elitcloud-public-data) automatically.

Training

Training API shares similar train methods between components. However, each component has its unique parameters and requires unique Document field.

POS

        tagger = POSFlairTagger()
        trn_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'pos'})
        dev_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'pos'})
        model_path = './posmodel'
        tagger.train(trn_docs=trn_docs, dev_docs=dev_docs, model_path=model_path,
                     pretrained_embeddings=('fasttext', 'crawl-300d-2M-subword'))

Write your own conll_to_documents if you have other formats else than .tsv.

To activate Flair embeddings, pass in forward_language_model and backward_language_model.
If GPU memory exhausted, pass in embeddings_in_memory=False.

NER

Almost the same with POS, except that you need to take care of the conll_to_documents.

trn_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'ner'})

DEP

Load your corpus to the dep field of Document structure, then run the following code.

parser = BiaffineDepParser()
parser.train(trn_docs=trn_docs,
             dev_docs=dev_docs,
             save_dir=save_dir,
             pretrained_embeddings=('fasttext', 'crawl-300d-2M-subword'), word_dims=300)

SDP

Similar to DEP, except that semantic relations are loaded to the 'sdp' field of Document structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorials

Tutorials

Install

Demos

Training

POS

NER

DEP

SDP

Clone this wiki locally