diff --git a/Dockerfile b/Dockerfile index 396daba..8417070 100644 --- a/Dockerfile +++ b/Dockerfile @@ -18,7 +18,7 @@ WORKDIR /work COPY requirements.in . RUN pip install -r requirements.in -RUN pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz +RUN pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz RUN python -m spacy download en_core_web_sm RUN python -m spacy download en_core_web_md diff --git a/README.md b/README.md index 210b36c..5bd7394 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ pip install scispacy to install a model (see our full selection of available models below), run a command like the following: ```bash -pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz +pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz ``` Note: We strongly recommend that you use an isolated Python environment (such as virtualenv or conda) to install scispacy. @@ -76,14 +76,14 @@ pip install CMD-V(to paste the copied URL) | Model | Description | Install URL |:---------------|:------------------|:----------| -| en_core_sci_sm | A full spaCy pipeline for biomedical data with a ~100k vocabulary. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz)| -| en_core_sci_md | A full spaCy pipeline for biomedical data with a ~360k vocabulary and 50k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz)| -| en_core_sci_lg | A full spaCy pipeline for biomedical data with a ~785k vocabulary and 600k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_lg-0.5.0.tar.gz)| -| en_core_sci_scibert | A full spaCy pipeline for biomedical data with a ~785k vocabulary and `allenai/scibert-base` as the transformer model. You may want to [use a GPU](https://spacy.io/usage#gpu) with this model. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_scibert-0.5.0.tar.gz)| -| en_ner_craft_md| A spaCy NER model trained on the CRAFT corpus.|[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_craft_md-0.5.0.tar.gz)| -| en_ner_jnlpba_md | A spaCy NER model trained on the JNLPBA corpus.| [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_jnlpba_md-0.5.0.tar.gz)| -| en_ner_bc5cdr_md | A spaCy NER model trained on the BC5CDR corpus. | [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bc5cdr_md-0.5.0.tar.gz)| -| en_ner_bionlp13cg_md | A spaCy NER model trained on the BIONLP13CG corpus. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bionlp13cg_md-0.5.0.tar.gz)| +| en_core_sci_sm | A full spaCy pipeline for biomedical data with a ~100k vocabulary. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz)| +| en_core_sci_md | A full spaCy pipeline for biomedical data with a ~360k vocabulary and 50k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz)| +| en_core_sci_lg | A full spaCy pipeline for biomedical data with a ~785k vocabulary and 600k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz)| +| en_core_sci_scibert | A full spaCy pipeline for biomedical data with a ~785k vocabulary and `allenai/scibert-base` as the transformer model. You may want to [use a GPU](https://spacy.io/usage#gpu) with this model. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz)| +| en_ner_craft_md| A spaCy NER model trained on the CRAFT corpus.|[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_craft_md-0.5.1.tar.gz)| +| en_ner_jnlpba_md | A spaCy NER model trained on the JNLPBA corpus.| [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_jnlpba_md-0.5.1.tar.gz)| +| en_ner_bc5cdr_md | A spaCy NER model trained on the BC5CDR corpus. | [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz)| +| en_ner_bionlp13cg_md | A spaCy NER model trained on the BIONLP13CG corpus. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bionlp13cg_md-0.5.1.tar.gz)| ## Additional Pipeline Components diff --git a/configs/base_parser_tagger.cfg b/configs/base_parser_tagger.cfg index cec9e47..738c85c 100644 --- a/configs/base_parser_tagger.cfg +++ b/configs/base_parser_tagger.cfg @@ -55,8 +55,9 @@ upstream = "*" factory = "tagger" [components.tagger.model] -@architectures = "spacy.Tagger.v1" +@architectures = "spacy.Tagger.v2" nO = null +normalize = False [components.tagger.model.tok2vec] @architectures = "spacy.Tok2VecListener.v1" diff --git a/docs/index.md b/docs/index.md index 8579f9e..b1aab56 100644 --- a/docs/index.md +++ b/docs/index.md @@ -17,14 +17,14 @@ pip install | Model | Description | Install URL |:---------------|:------------------|:----------| -| en_core_sci_sm | A full spaCy pipeline for biomedical data. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz)| -| en_core_sci_md | A full spaCy pipeline for biomedical data with a larger vocabulary and 50k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz)| -| en_core_sci_scibert | A full spaCy pipeline for biomedical data with a ~785k vocabulary and `allenai/scibert-base` as the transformer model. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_scibert-0.5.0.tar.gz)| -| en_core_sci_lg | A full spaCy pipeline for biomedical data with a larger vocabulary and 600k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_lg-0.5.0.tar.gz)| -| en_ner_craft_md| A spaCy NER model trained on the CRAFT corpus.|[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_craft_md-0.5.0.tar.gz)| -| en_ner_jnlpba_md | A spaCy NER model trained on the JNLPBA corpus.| [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_jnlpba_md-0.5.0.tar.gz)| -| en_ner_bc5cdr_md | A spaCy NER model trained on the BC5CDR corpus. | [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bc5cdr_md-0.5.0.tar.gz)| -| en_ner_bionlp13cg_md | A spaCy NER model trained on the BIONLP13CG corpus. | [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bionlp13cg_md-0.5.0.tar.gz)| +| en_core_sci_sm | A full spaCy pipeline for biomedical data. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz)| +| en_core_sci_md | A full spaCy pipeline for biomedical data with a larger vocabulary and 50k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz)| +| en_core_sci_scibert | A full spaCy pipeline for biomedical data with a ~785k vocabulary and `allenai/scibert-base` as the transformer model. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz)| +| en_core_sci_lg | A full spaCy pipeline for biomedical data with a larger vocabulary and 600k word vectors. |[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz)| +| en_ner_craft_md| A spaCy NER model trained on the CRAFT corpus.|[Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_craft_md-0.5.1.tar.gz)| +| en_ner_jnlpba_md | A spaCy NER model trained on the JNLPBA corpus.| [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_jnlpba_md-0.5.1.tar.gz)| +| en_ner_bc5cdr_md | A spaCy NER model trained on the BC5CDR corpus. | [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz)| +| en_ner_bionlp13cg_md | A spaCy NER model trained on the BIONLP13CG corpus. | [Download](https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bionlp13cg_md-0.5.1.tar.gz)| @@ -34,18 +34,18 @@ Our models achieve performance within 3% of published state of the art dependenc | model | UAS | LAS | POS | Mentions (F1) | Web UAS | |:---------------|:----|:------|:------|:---|:---| -| en_core_sci_sm | 89.27| 87.33 | 98.29 | 68.05 | 87.61 | -| en_core_sci_md | 89.86| 87.92 | 98.43 | 69.32 | 88.05 | -| en_core_sci_lg | 89.54| 87.66 | 98.29 | 69.52 | 87.68 | -| en_core_sci_scibert | 92.28| 90.83 | 98.93 | 67.84 | 92.63 | +| en_core_sci_sm | 89.03| 87.00 | 98.13 | 67.87 | 87.42 | +| en_core_sci_md | 89.73| 87.85 | 98.40 | 69.53 | 87.79 | +| en_core_sci_lg | 89.75| 87.79 | 98.49 | 69.69 | 87.74 | +| en_core_sci_scibert | 92.21| 90.65 | 98.86 | 68.01 | 92.58 | | model | F1 | Entity Types| |:---------------|:-----|:--------| -| en_ner_craft_md | 78.35|GGP, SO, TAXON, CHEBI, GO, CL| -| en_ner_jnlpba_md | 70.89| DNA, CELL_TYPE, CELL_LINE, RNA, PROTEIN | -| en_ner_bc5cdr_md | 84.70| DISEASE, CHEMICAL| -| en_ner_bionlp13cg_md | 76.79| AMINO_ACID, ANATOMICAL_SYSTEM, CANCER, CELL, CELLULAR_COMPONENT, DEVELOPING_ANATOMICAL_STRUCTURE, GENE_OR_GENE_PRODUCT, IMMATERIAL_ANATOMICAL_ENTITY, MULTI-TISSUE_STRUCTURE, ORGAN, ORGANISM, ORGANISM_SUBDIVISION, ORGANISM_SUBSTANCE, PATHOLOGICAL_FORMATION, SIMPLE_CHEMICAL, TISSUE | +| en_ner_craft_md | 76.75|GGP, SO, TAXON, CHEBI, GO, CL| +| en_ner_jnlpba_md | 72.28| DNA, CELL_TYPE, CELL_LINE, RNA, PROTEIN | +| en_ner_bc5cdr_md | 84.53| DISEASE, CHEMICAL| +| en_ner_bionlp13cg_md | 76.57| AMINO_ACID, ANATOMICAL_SYSTEM, CANCER, CELL, CELLULAR_COMPONENT, DEVELOPING_ANATOMICAL_STRUCTURE, GENE_OR_GENE_PRODUCT, IMMATERIAL_ANATOMICAL_ENTITY, MULTI-TISSUE_STRUCTURE, ORGAN, ORGANISM, ORGANISM_SUBDIVISION, ORGANISM_SUBSTANCE, PATHOLOGICAL_FORMATION, SIMPLE_CHEMICAL, TISSUE | ### Example Usage diff --git a/project.yml b/project.yml index 2f11d1a..58a8788 100644 --- a/project.yml +++ b/project.yml @@ -2,8 +2,8 @@ title: "scispaCy pipeline" description: "All the steps needed in the scispaCy pipeline" vars: - version_string: "0.5.0" - gpu_id: "0" + version_string: "0.5.1" + gpu_id: 0 freqs_loc_s3: "s3://ai2-s2-scispacy/data/gorc_subset.freqs" freqs_loc_local: "assets/gorc_subset.freqs" vectors_loc_s3: "s3://ai2-s2-scispacy/data/pubmed_with_header.txt.gz" @@ -131,25 +131,31 @@ workflows: - parser-tagger-train-sm - parser-tagger-train-md - parser-tagger-train-lg + - parser-tagger-train-scibert - ner-train-sm - ner-train-md - ner-train-lg - ner-train-specialized + - ner-train-scibert - evaluate-parser-tagger-sm - evaluate-parser-tagger-md - evaluate-parser-tagger-lg + - evaluate-parser-tagger-scibert - evaluate-ner-sm - evaluate-ner-md - evaluate-ner-lg - evaluate-specialized-ner + - evaluate-ner-scibert - package-sm - package-md - package-lg - package-ner + - package-scibert - evaluate-package-sm - evaluate-package-md - evaluate-package-lg - evaluate-package-ner + - evaluate-package-scibert commands: - name: download @@ -260,7 +266,7 @@ commands: - name: parser-tagger-train-sm help: "Train the base models" script: - - "spacy train ${vars.parser_tagger_config_loc} --output ${vars.parser_tagger_sm_loc} --code ${vars.code_loc} --paths.vocab_path ${vars.vocab_sm_loc} --vars.include_static_vectors False" + - "spacy train ${vars.parser_tagger_config_loc} --output ${vars.parser_tagger_sm_loc} --code ${vars.code_loc} --paths.vocab_path ${vars.vocab_sm_loc} --vars.include_static_vectors False --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_config_loc}" - "${vars.genia_train_spacy_loc}" @@ -273,7 +279,7 @@ commands: - name: parser-tagger-train-md help: "Train the base models" script: - - "spacy train ${vars.parser_tagger_config_loc} --output ${vars.parser_tagger_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True" + - "spacy train ${vars.parser_tagger_config_loc} --output ${vars.parser_tagger_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_config_loc}" - "${vars.genia_train_spacy_loc}" @@ -287,7 +293,7 @@ commands: - name: parser-tagger-train-lg help: "Train the base models" script: - - "spacy train ${vars.parser_tagger_config_loc} --output ${vars.parser_tagger_lg_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_lg_loc} --paths.vocab_path ${vars.vocab_lg_loc} --vars.include_static_vectors True" + - "spacy train ${vars.parser_tagger_config_loc} --output ${vars.parser_tagger_lg_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_lg_loc} --paths.vocab_path ${vars.vocab_lg_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_config_loc}" - "${vars.genia_train_spacy_loc}" @@ -303,7 +309,7 @@ commands: script: - "spacy train ${vars.parser_tagger_scibert_config_loc} --output ${vars.parser_tagger_scibert_loc} --code ${vars.code_loc} --paths.vocab_path ${vars.vocab_lg_loc} --gpu-id ${vars.gpu_id}" deps: - - "${vars.parser_tagger_config_loc}" + - "${vars.parser_tagger_scibert_config_loc}" - "${vars.genia_train_spacy_loc}" - "${vars.genia_dev_spacy_loc}" - "${vars.genia_test_spacy_loc}" @@ -314,7 +320,7 @@ commands: - name: ner-train-sm help: "Train the main ner" script: - - "spacy train ${vars.ner_config_loc} --output ${vars.ner_sm_loc} --code ${vars.code_loc} --paths.parser_tagger_path ${vars.parser_tagger_sm_loc}/model-best --paths.vocab_path ${vars.vocab_sm_loc} --vars.include_static_vectors False" + - "spacy train ${vars.ner_config_loc} --output ${vars.ner_sm_loc} --code ${vars.code_loc} --paths.parser_tagger_path ${vars.parser_tagger_sm_loc}/model-best --paths.vocab_path ${vars.vocab_sm_loc} --vars.include_static_vectors False --gpu-id ${vars.gpu_id}" deps: - "${vars.ner_config_loc}" - "${vars.parser_tagger_sm_loc}/model-best" @@ -325,7 +331,7 @@ commands: - name: ner-train-md help: "Train the main ner" script: - - "spacy train ${vars.ner_config_loc} --output ${vars.ner_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True" + - "spacy train ${vars.ner_config_loc} --output ${vars.ner_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" deps: - "${vars.ner_config_loc}" - "${vars.parser_tagger_md_loc}/model-best" @@ -337,7 +343,7 @@ commands: - name: ner-train-lg help: "Train the main ner" script: - - "spacy train ${vars.ner_config_loc} --output ${vars.ner_lg_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_lg_loc} --paths.parser_tagger_path ${vars.parser_tagger_lg_loc}/model-best --paths.vocab_path ${vars.vocab_lg_loc} --vars.include_static_vectors True" + - "spacy train ${vars.ner_config_loc} --output ${vars.ner_lg_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_lg_loc} --paths.parser_tagger_path ${vars.parser_tagger_lg_loc}/model-best --paths.vocab_path ${vars.vocab_lg_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" deps: - "${vars.ner_config_loc}" - "${vars.parser_tagger_lg_loc}/model-best" @@ -351,7 +357,7 @@ commands: script: - "spacy train ${vars.ner_scibert_config_loc} --output ${vars.ner_scibert_loc} --code ${vars.code_loc} --paths.parser_tagger_path ${vars.parser_tagger_scibert_loc}/model-best --gpu-id ${vars.gpu_id}" deps: - - "${vars.ner_config_loc}" + - "${vars.ner_scibert_config_loc}" - "${vars.parser_tagger_scibert_loc}/model-best" - "${vars.corpus_pubtator_loc_local}" outputs: @@ -360,10 +366,10 @@ commands: - name: ner-train-specialized help: "Train the specialized NER models" script: - - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.bc5cdr_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.bc5cdr_loc_local}/train.tsv --paths.dev_path ${vars.bc5cdr_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True" - - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.bionlp13cg_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.bionlp13cg_loc_local}/train.tsv --paths.dev_path ${vars.bionlp13cg_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True" - - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.craft_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.craft_loc_local}/train.tsv --paths.dev_path ${vars.craft_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True" - - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.jnlpba_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.jnlpba_loc_local}/train.tsv --paths.dev_path ${vars.jnlpba_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True" + - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.bc5cdr_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.bc5cdr_loc_local}/train.tsv --paths.dev_path ${vars.bc5cdr_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" + - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.bionlp13cg_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.bionlp13cg_loc_local}/train.tsv --paths.dev_path ${vars.bionlp13cg_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" + - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.craft_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.craft_loc_local}/train.tsv --paths.dev_path ${vars.craft_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" + - "spacy train ${vars.specialized_ner_config_loc} --output ${vars.jnlpba_md_loc} --code ${vars.code_loc} --paths.vectors ${vars.vectors_md_loc} --paths.parser_tagger_path ${vars.parser_tagger_md_loc}/model-best --paths.train_path ${vars.jnlpba_loc_local}/train.tsv --paths.dev_path ${vars.jnlpba_loc_local}/devel.tsv --paths.vocab_path ${vars.vocab_md_loc} --vars.include_static_vectors True --gpu-id ${vars.gpu_id}" deps: - "${vars.corpus_pubtator_loc_local}" - "${vars.bc5cdr_loc_local}/train.tsv" @@ -389,8 +395,8 @@ commands: - name: evaluate-parser-tagger-sm help: "Evaluate the parser and tagger" script: - - "spacy evaluate ${vars.parser_tagger_sm_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_sm_loc}/model_best_results.json" - - "spacy evaluate ${vars.parser_tagger_sm_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_sm_loc}/model_best_results_onto.json" + - "spacy evaluate ${vars.parser_tagger_sm_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_sm_loc}/model_best_results.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.parser_tagger_sm_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_sm_loc}/model_best_results_onto.json --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_sm_loc}/model-best" - "${vars.genia_test_spacy_loc}" @@ -402,8 +408,8 @@ commands: - name: evaluate-parser-tagger-md help: "Evaluate the parser and tagger" script: - - "spacy evaluate ${vars.parser_tagger_md_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_md_loc}/model_best_results.json" - - "spacy evaluate ${vars.parser_tagger_md_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_md_loc}/model_best_results_onto.json" + - "spacy evaluate ${vars.parser_tagger_md_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_md_loc}/model_best_results.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.parser_tagger_md_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_md_loc}/model_best_results_onto.json --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_md_loc}/model-best" - "${vars.genia_test_spacy_loc}" @@ -415,8 +421,8 @@ commands: - name: evaluate-parser-tagger-lg help: "Evaluate the parser and tagger" script: - - "spacy evaluate ${vars.parser_tagger_lg_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_lg_loc}/model_best_results.json" - - "spacy evaluate ${vars.parser_tagger_lg_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_lg_loc}/model_best_results_onto.json" + - "spacy evaluate ${vars.parser_tagger_lg_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_lg_loc}/model_best_results.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.parser_tagger_lg_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_lg_loc}/model_best_results_onto.json --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_lg_loc}/model-best" - "${vars.genia_test_spacy_loc}" @@ -428,8 +434,8 @@ commands: - name: evaluate-parser-tagger-scibert help: "Evaluate the parser and tagger scibert model" script: - - "spacy evaluate ${vars.parser_tagger_scibert_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_scibert_loc}/model_best_results.json --gpu-id ${vars.gpu_id}" - - "spacy evaluate ${vars.parser_tagger_scibert_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_scibert_loc}/model_best_results_onto.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.parser_tagger_scibert_loc}/model-best ${vars.genia_test_spacy_loc} --output ${vars.parser_tagger_scibert_loc}/model_best_results.json --gpu-id ${vars.gpu_id} --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.parser_tagger_scibert_loc}/model-best ${vars.ontonotes_test_spacy_loc} --output ${vars.parser_tagger_scibert_loc}/model_best_results_onto.json --gpu-id ${vars.gpu_id} --gpu-id ${vars.gpu_id}" deps: - "${vars.parser_tagger_scibert_loc}/model-best" - "${vars.genia_test_spacy_loc}" @@ -441,7 +447,7 @@ commands: - name: evaluate-ner-sm help: "Evaluate NER" script: - - "python scripts/evaluate_ner.py --model_path ${vars.ner_sm_loc}/model-best --dataset medmentions-test --output ${vars.ner_sm_loc}/model_best_results.json --med_mentions_folder_path assets/" + - "python scripts/evaluate_ner.py --model_path ${vars.ner_sm_loc}/model-best --dataset medmentions-test --output ${vars.ner_sm_loc}/model_best_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.ner_sm_loc}" - "${vars.corpus_pubtator_loc_local}" @@ -451,7 +457,7 @@ commands: - name: evaluate-ner-md help: "Evaluate NER" script: - - "python scripts/evaluate_ner.py --model_path ${vars.ner_md_loc}/model-best --dataset medmentions-test --output ${vars.ner_md_loc}/model_best_results.json --med_mentions_folder_path assets/" + - "python scripts/evaluate_ner.py --model_path ${vars.ner_md_loc}/model-best --dataset medmentions-test --output ${vars.ner_md_loc}/model_best_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.ner_md_loc}" - "${vars.corpus_pubtator_loc_local}" @@ -461,7 +467,7 @@ commands: - name: evaluate-ner-lg help: "Evaluate NER" script: - - "python scripts/evaluate_ner.py --model_path ${vars.ner_lg_loc}/model-best --dataset medmentions-test --output ${vars.ner_lg_loc}/model_best_results.json --med_mentions_folder_path assets/" + - "python scripts/evaluate_ner.py --model_path ${vars.ner_lg_loc}/model-best --dataset medmentions-test --output ${vars.ner_lg_loc}/model_best_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.ner_lg_loc}" - "${vars.corpus_pubtator_loc_local}" @@ -481,10 +487,10 @@ commands: - name: evaluate-specialized-ner help: "Evaluate specialize NER" script: - - "python scripts/evaluate_ner.py --model_path ${vars.bc5cdr_md_loc}/model-best --dataset ${vars.bc5cdr_loc_local}/test.tsv --output ${vars.bc5cdr_md_loc}/model_best_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.bionlp13cg_md_loc}/model-best --dataset ${vars.bionlp13cg_loc_local}/test.tsv --output ${vars.bionlp13cg_md_loc}/model_best_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.craft_md_loc}/model-best --dataset ${vars.craft_loc_local}/test.tsv --output ${vars.craft_md_loc}/model_best_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.jnlpba_md_loc}/model-best --dataset ${vars.jnlpba_loc_local}/test.tsv --output ${vars.jnlpba_md_loc}/model_best_results.json" + - "python scripts/evaluate_ner.py --model_path ${vars.bc5cdr_md_loc}/model-best --dataset ${vars.bc5cdr_loc_local}/test.tsv --output ${vars.bc5cdr_md_loc}/model_best_results.json --gpu_id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.bionlp13cg_md_loc}/model-best --dataset ${vars.bionlp13cg_loc_local}/test.tsv --output ${vars.bionlp13cg_md_loc}/model_best_results.json --gpu_id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.craft_md_loc}/model-best --dataset ${vars.craft_loc_local}/test.tsv --output ${vars.craft_md_loc}/model_best_results.json --gpu_id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.jnlpba_md_loc}/model-best --dataset ${vars.jnlpba_loc_local}/test.tsv --output ${vars.jnlpba_md_loc}/model_best_results.json --gpu_id ${vars.gpu_id}" deps: - "${vars.bc5cdr_md_loc}/model-best" - "${vars.bionlp13cg_md_loc}/model-best" @@ -512,9 +518,9 @@ commands: - name: evaluate-package-sm help: "Evaluate the packaged models" script: - - "spacy evaluate ${vars.package_sm_loc} ${vars.genia_test_spacy_loc} --output packages/sm_genia_results.json" - - "spacy evaluate ${vars.package_sm_loc} ${vars.ontonotes_test_spacy_loc} --output packages/sm_onto_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.package_sm_loc} --dataset medmentions-test --output packages/sm_mm_results.json --med_mentions_folder_path assets/" + - "spacy evaluate ${vars.package_sm_loc} ${vars.genia_test_spacy_loc} --output packages/sm_genia_results.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.package_sm_loc} ${vars.ontonotes_test_spacy_loc} --output packages/sm_onto_results.json --gpu-id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.package_sm_loc} --dataset medmentions-test --output packages/sm_mm_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.package_sm_loc}" outputs: @@ -553,9 +559,9 @@ commands: - name: evaluate-package-md help: "Evaluate the packaged models" script: - - "spacy evaluate ${vars.package_md_loc} ${vars.genia_test_spacy_loc} --output packages/md_genia_results.json" - - "spacy evaluate ${vars.package_md_loc} ${vars.ontonotes_test_spacy_loc} --output packages/md_onto_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.package_md_loc} --dataset medmentions-test --output packages/md_mm_results.json --med_mentions_folder_path assets/" + - "spacy evaluate ${vars.package_md_loc} ${vars.genia_test_spacy_loc} --output packages/md_genia_results.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.package_md_loc} ${vars.ontonotes_test_spacy_loc} --output packages/md_onto_results.json --gpu-id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.package_md_loc} --dataset medmentions-test --output packages/md_mm_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.package_md_loc}" outputs: @@ -566,9 +572,9 @@ commands: - name: evaluate-package-lg help: "Evaluate the packaged models" script: - - "spacy evaluate ${vars.package_lg_loc} ${vars.genia_test_spacy_loc} --output packages/lg_genia_results.json" - - "spacy evaluate ${vars.package_lg_loc} ${vars.ontonotes_test_spacy_loc} --output packages/lg_onto_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.package_lg_loc} --dataset medmentions-test --output packages/lg_mm_results.json --med_mentions_folder_path assets/" + - "spacy evaluate ${vars.package_lg_loc} ${vars.genia_test_spacy_loc} --output packages/lg_genia_results.json --gpu-id ${vars.gpu_id}" + - "spacy evaluate ${vars.package_lg_loc} ${vars.ontonotes_test_spacy_loc} --output packages/lg_onto_results.json --gpu-id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.package_lg_loc} --dataset medmentions-test --output packages/lg_mm_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.package_lg_loc}" outputs: @@ -580,7 +586,7 @@ commands: help: "Evaluate the packaged scibert model" script: - "spacy evaluate ${vars.package_scibert_loc} ${vars.genia_test_spacy_loc} --output packages/scibert_genia_results.json --gpu-id ${vars.gpu_id}" - - "spacy evaluate ${vars.package_scibert_loc} ${vars.ontonotes_test_spacy_loc} --output packages/scibert_onto_results.json --gpu-id ${var.gpu_id}" + - "spacy evaluate ${vars.package_scibert_loc} ${vars.ontonotes_test_spacy_loc} --output packages/scibert_onto_results.json --gpu-id ${vars.gpu_id}" - "python scripts/evaluate_ner.py --model_path ${vars.package_scibert_loc} --dataset medmentions-test --output packages/scibert_mm_results.json --med_mentions_folder_path assets/ --gpu_id ${vars.gpu_id}" deps: - "${vars.package_scibert_loc}" @@ -588,8 +594,6 @@ commands: - "packages/scibert_genia_results.json" - "packages/scibert_onto_results.json" - "packages/scibert_mm_results.json" - - - name: package-ner help: "Package the models" @@ -612,10 +616,10 @@ commands: - name: evaluate-package-ner help: "Evaluate the packaged models" script: - - "python scripts/evaluate_ner.py --model_path ${vars.package_bc5cdr_loc} --dataset ${vars.bc5cdr_loc_local}/test.tsv --output packages/bc5cdr_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.package_bionlp13cg_loc} --dataset ${vars.bionlp13cg_loc_local}/test.tsv --output packages/bionlp13cg_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.package_craft_loc} --dataset ${vars.craft_loc_local}/test.tsv --output packages/craft_results.json" - - "python scripts/evaluate_ner.py --model_path ${vars.package_jnlpba_loc} --dataset ${vars.jnlpba_loc_local}/test.tsv --output packages/jnlpba_results.json" + - "python scripts/evaluate_ner.py --model_path ${vars.package_bc5cdr_loc} --dataset ${vars.bc5cdr_loc_local}/test.tsv --output packages/bc5cdr_results.json --gpu_id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.package_bionlp13cg_loc} --dataset ${vars.bionlp13cg_loc_local}/test.tsv --output packages/bionlp13cg_results.json --gpu_id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.package_craft_loc} --dataset ${vars.craft_loc_local}/test.tsv --output packages/craft_results.json --gpu_id ${vars.gpu_id}" + - "python scripts/evaluate_ner.py --model_path ${vars.package_jnlpba_loc} --dataset ${vars.jnlpba_loc_local}/test.tsv --output packages/jnlpba_results.json --gpu_id ${vars.gpu_id}" deps: - "${vars.package_bc5cdr_loc}" - "${vars.package_bionlp13cg_loc}" diff --git a/requirements.in b/requirements.in index 8351a3a..3cc4a36 100644 --- a/requirements.in +++ b/requirements.in @@ -1,5 +1,5 @@ numpy -spacy>=3.2.0,<3.3.0 +spacy>=3.4.0,<3.5.0 spacy-lookups-data pandas requests>=2.0.0,<3.0.0 diff --git a/scispacy/candidate_generation.py b/scispacy/candidate_generation.py index 9287abd..e7f3981 100644 --- a/scispacy/candidate_generation.py +++ b/scispacy/candidate_generation.py @@ -282,10 +282,10 @@ def nmslib_knn_with_zero_vectors( distances.append([]) # interleave `neighbors` and Nones in `extended_neighbors` extended_neighbors[empty_vectors_boolean_flags] = numpy.array( - neighbors, dtype="object" + neighbors, dtype=object )[:-1] extended_distances[empty_vectors_boolean_flags] = numpy.array( - distances, dtype="object" + distances, dtype=object )[:-1] return extended_neighbors, extended_distances diff --git a/scispacy/version.py b/scispacy/version.py index c8b11da..b7139a8 100644 --- a/scispacy/version.py +++ b/scispacy/version.py @@ -1,6 +1,6 @@ _MAJOR = "0" _MINOR = "5" -_REVISION = "0" +_REVISION = "1" VERSION_SHORT = "{0}.{1}".format(_MAJOR, _MINOR) VERSION = "{0}.{1}.{2}".format(_MAJOR, _MINOR, _REVISION) diff --git a/scripts/install_local_packages.py b/scripts/install_local_packages.py new file mode 100644 index 0000000..6f1e293 --- /dev/null +++ b/scripts/install_local_packages.py @@ -0,0 +1,33 @@ +import os + +from scispacy.version import VERSION + + +def main(): + model_names = [ + "en_core_sci_sm", + "en_core_sci_md", + "en_core_sci_lg", + "en_core_sci_scibert", + "en_ner_bc5cdr_md", + "en_ner_craft_md", + "en_ner_bionlp13cg_md", + "en_ner_jnlpba_md", + ] + + full_package_paths = [ + os.path.join( + "packages", + f"{model_name}-{VERSION}", + "dist", + f"{model_name}-{VERSION}.tar.gz", + ) + for model_name in model_names + ] + + for package_path in full_package_paths: + os.system(f"pip install {package_path}") + + +if __name__ == "__main__": + main() diff --git a/scripts/install_remote_packages.py b/scripts/install_remote_packages.py new file mode 100644 index 0000000..60ff0f5 --- /dev/null +++ b/scripts/install_remote_packages.py @@ -0,0 +1,28 @@ +import os + +from scispacy.version import VERSION + + +def main(): + s3_prefix = "https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/" + model_names = [ + "en_core_sci_sm", + "en_core_sci_md", + "en_core_sci_lg", + "en_core_sci_scibert", + "en_ner_bc5cdr_md", + "en_ner_craft_md", + "en_ner_bionlp13cg_md", + "en_ner_jnlpba_md", + ] + + full_package_paths = [ + f"{s3_prefix}{model_name}-{VERSION}.tar.gz" for model_name in model_names + ] + + for package_path in full_package_paths: + os.system(f"pip install {package_path}") + + +if __name__ == "__main__": + main() diff --git a/scripts/print_out_metrics.py b/scripts/print_out_metrics.py new file mode 100644 index 0000000..4c634b5 --- /dev/null +++ b/scripts/print_out_metrics.py @@ -0,0 +1,45 @@ +import os +import json + + +def main(): + core_model_names = ["lg", "md", "sm", "scibert"] + ner_model_names = ["bc5cdr", "bionlp13cg", "craft", "jnlpba"] + + base_path = "packages" + for core_model_name in core_model_names: + print(f"Printing results for {core_model_name}") + with open( + os.path.join(base_path, f"{core_model_name}_genia_results.json") + ) as _genia_results_file: + genia_results = json.load(_genia_results_file) + + with open( + os.path.join(base_path, f"{core_model_name}_onto_results.json") + ) as _onto_results_file: + onto_results = json.load(_onto_results_file) + + with open( + os.path.join(base_path, f"{core_model_name}_mm_results.json") + ) as _mm_results_file: + mm_results = json.load(_mm_results_file) + + print(f"Genia tag accuracy: {genia_results['tag_acc']}") + print(f"Genia uas: {genia_results['dep_uas']}") + print(f"Genia las: {genia_results['dep_las']}") + print(f"Ontonotes uas: {onto_results['dep_uas']}") + print(f"MedMentions F1: {mm_results['f1-measure-untyped']}") + print() + + for ner_model_name in ner_model_names: + print(f"Printing results for {ner_model_name}") + with open( + os.path.join(base_path, f"{ner_model_name}_results.json") + ) as _ner_results_file: + ner_results = json.load(_ner_results_file) + + print(f"NER F1: {ner_results['f1-measure-overall']}") + + +if __name__ == "__main__": + main() diff --git a/scripts/smoke_test.py b/scripts/smoke_test.py new file mode 100644 index 0000000..0cb1efc --- /dev/null +++ b/scripts/smoke_test.py @@ -0,0 +1,89 @@ +import spacy +from tqdm import tqdm + +from scispacy.abbreviation import AbbreviationDetector +from scispacy.linking import EntityLinker + + +def main(): + print("Testing core models...") + print() + model_names = [ + "en_core_sci_sm", + "en_core_sci_md", + "en_core_sci_lg", + "en_core_sci_scibert", + "en_ner_bc5cdr_md", + "en_ner_craft_md", + "en_ner_bionlp13cg_md", + "en_ner_jnlpba_md", + ] + + models = [ + spacy.load(model_name) + for model_name in tqdm(model_names, desc="Loading core models") + ] + + text = ( + "DNA is a very important part of the cellular structure of the body. " + "John uses IL gene and interleukin-2 to treat diabetes and " + "aspirin as proteins for arms and legs on lemurs and humans." + ) + + for model_name, model in zip(model_names, models): + print(f"Testing {model_name}") + doc = model(text) + for sentence in doc.sents: + print([t.text for t in sentence]) + print([t.lemma_ for t in sentence]) + print([t.pos_ for t in sentence]) + print([t.tag_ for t in sentence]) + print([t.dep_ for t in sentence]) + print([t.ent_type_ for t in sentence]) + print() + print() + input("Continue?") + + print("Testing abbreivation detector...") + abbreviation_nlp = spacy.load("en_core_sci_sm") + abbreviation_nlp.add_pipe("abbreviation_detector") + abbreviation_text = ( + "Spinal and bulbar muscular atrophy (SBMA) is an inherited " + "motor neuron disease caused by the expansion of a polyglutamine " + "tract within the androgen receptor (AR). SBMA can be caused by this easily." + ) + abbreviation_doc = abbreviation_nlp(abbreviation_text) + for abbrevation in abbreviation_doc._.abbreviations: + print( + f"{abbrevation} \t ({abbrevation.start}, {abbrevation.end}) {abbrevation._.long_form}" + ) + print() + input("Continue?") + + print("Testing entity linkers...") + print() + ontology_names = ["umls", "mesh", "rxnorm", "go", "hpo"] + ontology_models = [spacy.load("en_core_sci_sm") for _ in ontology_names] + for ontology_name, ontology_model in tqdm( + zip(ontology_names, ontology_models), desc="Adding entity linker pipes" + ): + ontology_model.add_pipe( + "scispacy_linker", + config={"resolve_abbreviations": False, "linker_name": ontology_name}, + ) + + linking_text = "Diabetes is a disease that affects humans and is treated with aspirin via a metabolic process." + for ontology_name, ontology_model in zip(ontology_names, ontology_models): + print(f"Testing {ontology_name} linker...") + linker_pipe = ontology_model.get_pipe("scispacy_linker") + doc = ontology_model(linking_text) + for entity in doc.ents: + print("Entity name: ", entity) + for ontology_entity in entity._.kb_ents[:1]: + print(linker_pipe.kb.cui_to_entity[ontology_entity[0]]) + print() + input("Continue?") + + +if __name__ == "__main__": + main() diff --git a/scripts/uninstall_local_packages.py b/scripts/uninstall_local_packages.py new file mode 100644 index 0000000..d14d69f --- /dev/null +++ b/scripts/uninstall_local_packages.py @@ -0,0 +1,23 @@ +import os + +from scispacy.version import VERSION + + +def main(): + model_names = [ + "en_core_sci_sm", + "en_core_sci_md", + "en_core_sci_lg", + "en_core_sci_scibert", + "en_ner_bc5cdr_md", + "en_ner_craft_md", + "en_ner_bionlp13cg_md", + "en_ner_jnlpba_md", + ] + + for package_name in model_names: + os.system(f"pip uninstall {package_name}") + + +if __name__ == "__main__": + main() diff --git a/setup.py b/setup.py index c972d5c..91373dc 100644 --- a/setup.py +++ b/setup.py @@ -41,7 +41,7 @@ packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]), license="Apache", install_requires=[ - "spacy>=3.2.0,<3.3.0", + "spacy>=3.4.0,<3.5.0", "requests>=2.0.0,<3.0.0", "conllu", "numpy", diff --git a/tests/test_hyponym_detector.py b/tests/test_hyponym_detector.py index e8ab8f6..f89f8e3 100644 --- a/tests/test_hyponym_detector.py +++ b/tests/test_hyponym_detector.py @@ -20,7 +20,7 @@ def test_sentences(self): ) doc = self.nlp(text) fig_trees = doc[21:23] - plant_species = doc[17:19] + plant_species = doc[16:19] assert doc._.hearst_patterns == [("such_as", plant_species, fig_trees)] doc = self.nlp("SARS, or other coronaviruses, are bad.")