EHR-phenolyzer

EHR-Phenolyzer is a python pipeline to automatically translate raw clinical notes into meaningfully ranked candidate causal genes. It might greatly shorten the time for disease causal genes identification and discovery.

PREREQUISITES

Python 2.7 or Python 3.6
metamap16.BINARY.Linux (2016) (needed only if choosing MetaMap as NLP processor)
NCBO Annotator API KEY (needed only if choosing NCBO annotator as NLP processor)
phenolyzer
linux environment

INSTALLATION

Install python modules

$pip install requests

Install MetaMap (needed only if choosing MetaMap as NLP)

register at UMLS Terminology Services and obtain appropriate license (https://uts.nlm.nih.gov//license.html)
download "MetaMap 2016V2 Linux Version" from https://metamap.nlm.nih.gov/MainDownload.shtml
following the MetaMap installation instruction (https://metamap.nlm.nih.gov/Installation.shtml)
export MetaMap executable binary to your linux system PATH (export PATH="/path/to/public_mm/bin:$PATH")

Get NCBO API Key (needed only if choosing NCBO annotator as NLP)

register a new BioPortal Account (https://bioportal.bioontology.org/accounts/new)
login to your account (https://bioportal.bioontology.org/login)
at the user panel, click your user name at the upper left corner of the banner,and then choose "Account Settings"
create a file named "ncbo.apikey.txt" under EHR-Phenozer lib/ folder ("see example ncbo.apikey.txt.example"), and then copy your API Key to the first line of this file

Get MedLEE XML output (needed only if choosing MedLEE as NLP)

obtain an appropriate license to use MedLEE
analyze clinical notes and generate XML output

Install Phenolyzer

download Phenolyzer through "git clone https://github.com/WGLab/phenolyzer"
install dependencies: Bioperl, Bio::OntologyIO and Graph::Directed
export phenolyzer executable file to your linux system PATH ( export PATH="/path/to/phenolyzer:$PATH")

Install EHR-Phenolyzer

git clone [email protected]:WGLab/EHR-Phenolyzer.git
cd EHR-Phenolyzer
python ehr_phenolyzer.py --help

TEST

python ehr_phenolyzer.py -i example/Kleyner_ANKRD11.txt -p kleyner -n "metamap" > ehr_phenolyzer.log

For more testing examples, please check and run the bash scripts under test/

USAGE

usage: ehr_phenolyzer.py [-h] -i INPUT [-p PREFIX] [-n NLP] [-d OUTDIR] [-k]
                         [-m OMIM] [-x OBO]

Get ranked gene ids based on EHR medical notes

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        medical note file in txt format (in case of using
                        medlee, the input is medlee xml format)
  -p PREFIX, --prefix PREFIX
                        the prefix for the output file
  -n NLP, --nlp NLP     type of NLP (metamap (default),medlee, NCBOannotator)
  -d OUTDIR, --outdir OUTDIR
                        the path to the output folder
  -k, --keeptmp         keep temporary files
  -m OMIM, --omim OMIM  path to the OMIM txt file
  -x OBO, --obo OBO     path to HPO obo file

One step from EHR records to ranked gene list.Before running, please install
Phenolyzer, and get the NLP tools ready.

About Input Data

OMIM data

The source file is available from OMIM as the morbidmap.txt file after you get access to OMIM. The gene names were further extracted from this source file, and the aliases gene names and offical gene names were grouped into one line separated by ",". This file can be found in the folder "db/"

HPO obo format data

The source file was download from http://purl.obolibrary.org/obo/hp.obo. This file can be also found in the folder 'db/'

medical notes file

Medical notes file should be in plain text format, and examples notes files can be found in folder "example/". However, if you use MedLEE as the NLP engine, the input file should be XML file processed by MedLEE.

License Agreement

By using the software, you acknowledge that you agree to the terms below:

For academic and non-profit use, you are free to fork, download, modify, distribute and use the software without restriction.

For commercial use, you are required to contact Columbia Technology Ventures to discuss licensing options.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EHR-phenolyzer

PREREQUISITES

INSTALLATION

Install python modules

Install MetaMap (needed only if choosing MetaMap as NLP)

Get NCBO API Key (needed only if choosing NCBO annotator as NLP)

Get MedLEE XML output (needed only if choosing MedLEE as NLP)

Install Phenolyzer

Install EHR-Phenolyzer

TEST

USAGE

About Input Data

OMIM data

HPO obo format data

medical notes file

License Agreement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
db		db
example		example
lib		lib
test		test
README.md		README.md
ehr_phenolyzer.py		ehr_phenolyzer.py

batsal/EHR-Phenolyzer

Folders and files

Latest commit

History

Repository files navigation

EHR-phenolyzer

PREREQUISITES

INSTALLATION

Install python modules

Install MetaMap (needed only if choosing MetaMap as NLP)

Get NCBO API Key (needed only if choosing NCBO annotator as NLP)

Get MedLEE XML output (needed only if choosing MedLEE as NLP)

Install Phenolyzer

Install EHR-Phenolyzer

TEST

USAGE

About Input Data

OMIM data

HPO obo format data

medical notes file

License Agreement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages