-
Install node.js, python3, install JDK 8
-
Verify that both are installed by running
node -v
andpython3 --version
. Set JAVA_HOME and your bash profile and confirm it byecho $JAVA_HOME
-
Go inside Backend, Frontend, and kafka-backend and run
npm install
in each case -
a) Go inside PythonBackend and run
python3 klien-server.py
b) You will get an error such as
klien is missing
d) Run the below two commands -
pip3 install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_ner_bc5cdr_md-0.2.4.tar.gz python3 -m spacy download en_core_web_sm
c) Run
pip3 install <name_of_the_package>
to install the missing dependency. For example,pip3 install klien
. Runpython3 klien-server.py
till all such errors are resolved. Once you see the server starting, kill the server by Ctrl + C. -
a) Unzip
kafka_2.11-1.1.0.zip
andcd
to kafka_2.11-1.1.0b) Inside kafka_2.11-1.1.0 run
bin/zookeeper-server-start.sh config/zookeeper.properties
c) Open another terminal window and
cd
to kafka_2.11-1.1.0. When inside, runbin/kafka-server-start.sh config/server.properties
d) Open another terminal window and
cd
to kafka_2.11-1.1.0. When inside, runbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic access bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic chat bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic order bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic post_book bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic response_topics
e) Verify that these five topics have been created by running
bin/kafka-topics.sh --list --zookeeper localhost:2181
-
a) You should have four terminal windows with the following directories open Backend, Frontend, kafka-backend, and PythonBackend.
b) In PythonBackend terminal, run
python3 klien-server.py
. In the other three terminals run,npm start
.c) In you browser, go to http://localhost:3000/
We propose to build a React-based UI where the user can upload their medical reports. React native will be used to implement the solution, which will enable the application to be ported to mobile platforms as well. Tesseract OCR and Aspose APIs will be used to extract medical report data while scanning the report. We will store RDF medical data from DBpedia and other datasets in MarkLogic. We will also be using various NLP/NLG libraries such as Apache OpenNLP, SimpleNLP, etc., to generate a human-like response for the user. Our Java-based system will interface with the RDF database and NLP/NLG libraries to send back repose to the UI.
We will provide support for multiple space-delimited languages, for example, English, Spanish, and all Indian languages. We will not able to support languages like Chinese, Japanese, Korean, Thai, Khmer whose writing systems don't use spaces since OpenNLP performs space-level token parsing.
Limited literacy skills are one of the strongest predictors of poor health outcomes for patients. Studies have shown that when patients have low reading fluency, they know less about their chronic diseases, they are worse at managing their care[1].Overall, studies of patient-accessible medical records suggest modest improvements in doctor-patient communication, adherence, patient empowerment, and patient education[2].
MediReport will solve the following use cases -Actor | Use case |
---|---|
Patient trying to understand a medical report | Patients often have difficulties understanding the clinical data presented in portals. In response, increasingly, patients either ignore their reports or go online[3] to make sense of this data. The medical information provided online in forums and discussion groups can lead to patient anxiety or such information may not always be applicable. MediReport will give the user a one-stop solution for patients to understand their medical reports, the meaning and impact of each term (for ex., Bilirubin, Creatinine, etc.) as well as ways to manage it |
Patients trying to understand a doctor's prescription | 40-80% of the medical information provided by healthcare practitioners is forgotten immediately. The greater the amount of information presented, the lower the proportion correctly recalled;[4] Furthermore, almost half of the information that is remembered is incorrect.[5] To help patients in recalling and understanding each prescribed medicine, MediReport will augment the prescription with explanation, categorization (antibiotic, antibacterial, etc.) side-effects, medical usage, mode of action, etc. |
Physicians trying to make a diagnosis | A study by Meyer and Payne[6] suggests that the association between physicians’ diagnostic accuracy and their confidence in that accuracy may be poor and that physicians may not request the required additional resources (ie, additional tests, second opinions, curbside consultations, referrals, and reference materials) to facilitate diagnosis when they most need it. These mismatched associations might prevent physicians from reexamining difficult cases when their diagnosis is incorrect. Improving these associations and the use of potential resources in handling difficult cases could potentially reduce diagnostic error. MediReport will help reduce diagnostic error by providing the physician the additional resources such as Signs and symptoms, Virology, Pathophysiology, Diagnosis, Prevention, Treatment, Management, Prognosis, etc. |
Insurance company validating a claim | The insurers know a lot about you, based on claims. They aggregate data, such as imaging, medications, referrals, admissions, and emergency department visits, as well as quality metrics around severity-adjusted episodes of care for specific diagnoses[7]. MediReport will help insurance companies in making decisions about pre-existing conditions, valid claims, reporting malpractices, etc. by providing them with a clear understanding of the thousands of medical terms and jargon that can be difficult to remember. It will save insurance companies a lot of money by reducing work hours in understanding medical cases and well as reduce dependence on consultants that need to be paid high salaries. |
Technology | Choice and viability |
---|---|
React Native | React Native is a multi-platform solution developed by Facebook that lets you build mobile apps using JavaScript. These mobile apps are considered multi-platform because they’re written once and deployed across many platforms, like Android, iOS and the web. We will use React Native to create a cross-platform app that can be run on iOS, Android and Web. |
spaCy | spaCy is an open-source software library for advanced natural language processing. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research in 2015 found spaCy to be the fastest in the world[8]. We will be using spaCy to perform most of our non-clinical NLP tasks such as NER, sentence detection, etc. |
medaCy and Apache cTAKES™ | MedaCy is a text processing and learning framework built over spaCy to support the lightning fast prototyping, training, and application of highly predictive medical NLP models. It is designed to streamline researcher workflow by providing utilities for model training, prediction and organization while insuring the replicability of systems. Apache cTAKES™ is a natural language processing system for extraction of information from electronic medical record clinical free-text. It can discover codable entities, temporal events, properties and relations. It can process database or file-stored batches at 50,000 clinical notes per hour and can be scaled up to run on clusters, queue systems and cloud computing services. We will use a combination of medaCy and Apache cTAKES™ for all clinical NLP suhc as clinical NER, co-reference resolution etc. We might use another NLP tool if the need arises. |
Linked data | Linked data (often capitalized as Linked Data) is structured data that is interlinked with other data so it becomes more useful through semantic queries[9]. It builds upon standard Web technologies such as HTTP, RDF, and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. We will use Dbpedia, a Linked dataset because it can provide us virtually every information present in Wikipedia as well as enable us to combine various Linked Data sources to generate a comprehensive dataset. |
DBpedia | DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to questions where the information is spread across multiple Wikipedia articles. Data is accessed using an SQL-like query language for RDF called SPARQL. We will use DBpedia as our primary dataset for providing information in more than 100 languages. |
Aspose | Aspose provides the most complete set of PDF/Word/Excel manipulation and parsing solution for developers & end-users. Aspose is one of the leaders with a suite of tools for creating PDF/MS Office documents[10]. We will use Aspose to parse and extract information from medical reports in PDF, MS Word, and other supported formats. |
Tesseract | Tesseract is an OCR engine with support for Unicode and the ability to recognize more than 100 languages out of the box. It can be trained to recognize other languages. We will use it to do OCR of medical reports where a user will be able to quickly take a picture of his/her medical record and get the most accurate data from MediReport |
SimpleNLG | SimpleNLG can be used to help you write a program that generates grammatically correct English sentences. It’s a library (not an application), written in Java, which performs simple and useful tasks that are necessary for natural language generation (NLG). In the final phase, which is not of the highest priority, we will use SimpleNLG to generate human-like response for the user. |
MarkLogic | MarkLogic Server is a powerful software solution for harnessing your digital content all in a single database. MarkLogic enables you to build complex applications that interact with large volumes of JSON, XML, SGML, HTML, RDF triples, binary files, and other popular content formats. The unique architecture of MarkLogic ensures that your applications are both scalable and high-performance, delivering query results at search-engine speeds while providing transactional integrity over the underlying database[11]. Gartner’s most recent Magic Quadrant report for Operational Database Management Systems underscores my position. Of all the companies in the challengers’ quadrant, MarkLogic achieved the highest placement for its “ability to execute.”[12] |
Doctor's presciption - 1 capsule of Advil for 5 days.
- Fetch user's medical report via React-based UI/Node.js
- Use Apose PDF or Tesseract OCR to extrat text data from the medical report
- Use medaCy and Apache cTAKES™ for clincal NLP where we extract entites, relationships
- Enhance the above extracted data by providing explanations of each term using Dbpedia
- (Low priority) Use various NLP/NLG libraries such as Apache OpenNLP, SimpleNLP, etc., to generate a human-like response for the user.
{
'entities': {
'T1': ('Cholesterol', 'Cholesterol is a molecule that is found in animal cells and body fluids. Cholesterol is not found in plant sources. It is a type of lipid which is a fat or fat-like molecule. Cholesterol is a soft waxy substance. Cholesterol is a special type of lipid that is called a steroid. Steroids are lipids that have a special chemical structure. This structure is made of four rings of carbon atoms. Cholesterol is found especially in animal fats. Hypercholesterolemia means that cholesterol level is too high in the blood. High cholesterol levels show that heart disease may develop.'),
'T2': ('Drug', 'High-density lipoprotein (HDL) is one of the five major groups of lipoproteins.[1] Lipoproteins are complex particles composed of multiple proteins which transport all fat molecules (lipids) around the body within the water outside cells. Increasing concentrations of HDL particles are strongly associated with decreasing accumulation of atherosclerosis within the walls of arteries. This is important because atherosclerosis eventually results in sudden plaque ruptures, cardiovascular disease, stroke and other vascular diseases. ')
...
}
}
{
'entities': {
'T3': ('Drug', 40, 45, 'Advil'),
'T1': ('Dosage', 27, 28, '1'),
'T2': ('Form', 29, 36, 'capsule'),
'T4': ('Duration', 46, 56, 'for 5 days')
},
'relations': []
}
I love this idea. Use NLG to generate commentary for users. I would stick to mobile app and make it useful for underprivileged users in rural area and support multiple languages such as spanish, hindi, tamil, telugu etc...
-
Graham, S., & Brookey, J. (2008). Do patients understand?. The Permanente journal, 12(3), 67–69. doi:10.7812/tpp/07-144
-
Ross, S. E., & Lin, C. T. (2003). The effects of promoting patient access to medical records: a review. Journal of the American Medical Informatics Association : JAMIA, 10(2), 129–138. doi:10.1197/jamia.m1147
-
Reynolds, T. L., Ali, N., McGregor, E., O'Brien, T., Longhurst, C., Rosenberg, A. L., … Zheng, K. (2018). Understanding Patient Questions about their Medical Records in an Online Health Forum: Opportunity for Patient Portal Design. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2017, 1468–1477.
-
McGuire LC. Remembering what the doctor said: organization and older adults' memory for medical information. Exp Aging Res 1996;22: 403-28
-
Anderson JL, Dodman S, Kopelman M, Fleming A. Patient information recall in a rheumatology clinic. Rheumatol Rehabil 1979;18: 245-55
-
Meyer AND, Payne VL, Meeks DW, Rao R, Singh H. Physicians’ Diagnostic Accuracy, Confidence, and Resource Requests: A Vignette Study. JAMA Intern Med. 2013;173(21):1952–1958. doi:10.1001/jamainternmed.2013.10081
-
Kaufman J. M. (2015). How to work with insurance companies. Neurology. Clinical practice, 5(5), 448–453. doi:10.1212/CPJ.0000000000000179
-
Facts & Figures https://spacy.io/usage/facts-figures
-
Bizer, Christian; Heath, Tom; Berners-Lee, Tim (2009). "Linked Data – The Story So Far" (PDF). International Journal on Semantic Web and Information Systems. 5 (3). doi:10.4018/jswis.2009081901.
-
Create Documents with Aspose.Pdf for .NET https://visualstudiomagazine.com/articles/2010/09/01/create-documents-with-asposepdf-for-net.aspx
-
Getting Started With MarkLogic Server https://docs.marklogic.com/guide/getting-started/intro
-
Why MarkLogic Will Lead the Next-Generation of Database Technology https://www.marklogic.com/blog/marklogic-will-lead-next-generation-multi-model-database/