-
Notifications
You must be signed in to change notification settings - Fork 8
Tutorials
pip install elit
pip3 uninstall mxnet-cu92 -y # Shall we rely on mxnet-cu92 instead of mxnet?
pip3 uninstall mxnet -y
pip3 install mxnet-cu92
The current version (elit 0.1.32
) works only on GPU, since the embedding interface gluonnlp.embedding.create
throws mxnet.base.MXNetError: [11:33:12] src/storage/storage.cc:143: Compile with USE_CUDA=1 to enable GPU usage
on CPU.
All demos are in elit/tests/demo. We have 5 components released in this version, which are EnglishTokenizer
, POSTagger
, NERTagger
, DependencyParser
and SDPParser
. Except for the rule-based EnglishTokenizer
, every component needs to be loaded in a similar way, see following example.
pos_tagger = POSTagger()
pos_tagger.load()
The load
method downloads pretrained models from our S3 bucket (elitcloud-public-data) automatically.
Training API shares similar train
methods between components. However, each component has its unique parameters and requires unique Document
field.
tagger = POSFlairTagger()
trn_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'pos'})
dev_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'pos'})
model_path = './posmodel'
tagger.train(trn_docs=trn_docs, dev_docs=dev_docs, model_path=model_path,
pretrained_embeddings=('fasttext', 'crawl-300d-2M-subword'))
Write your own conll_to_documents
if you have other formats else than .tsv
.
- To activate Flair embeddings, pass in
forward_language_model
andbackward_language_model
. - If GPU memory exhausted, pass in
embeddings_in_memory=False
.
Almost the same with POS, except that you need to take care of the conll_to_documents
.
trn_docs = conll_to_documents('train.tsv', headers={0: 'text', 1: 'ner'})
Load your corpus to the dep
field of Document
structure, then run the following code.
parser = BiaffineDepParser()
parser.train(trn_docs=trn_docs,
dev_docs=dev_docs,
save_dir=save_dir,
pretrained_embeddings=('fasttext', 'crawl-300d-2M-subword'), word_dims=300)
Similar to DEP, except that semantic relations are loaded to the 'sdp'
field of Document
structure.