Lookup dictionary for pretrained embedding #3

victorconan · 2020-10-09T14:19:21Z

Hi Andrew,

Do you have a lookup dictionary for the pretrained embeddings? I saw in the embedding file, the "medical concepts" are in format of "CXXXX", not sure if they are ICD codes, procedure codes or something else.

Thanks!

reality · 2020-11-05T17:53:30Z

Hello Victor,

I have been looking into this work recently, I think that CUI mapping files / scripts to convert can be found in the repository for embeddings: https://github.com/clinicalml/embeddings/tree/master/eval

Cheers

kaushikacharya · 2020-11-06T04:54:34Z

the "medical concepts" are in format of "CXXXX", not sure if they are ICD codes, procedure codes or something else

These are UMLS concept unique identifier(CUI)

Examples from https://arxiv.org/pdf/1804.01486.pdf

Primary condition: premature infant (CUI: C0021294) Comorbidity:
bronchopulmonary dysplasia (CUI: C0006287)

UMLS CUIs can be browsed on https://uts.nlm.nih.gov/metathesaurus.html
(N.B. You would need to register yourself first).

KrishnaPG · 2020-12-13T14:21:25Z

Came across this post while looking for information on the meaning of the columns in the cui2vec_pretrained.csv file. The columns are named v1, v2 ... v500. Where can we get information on what do these 500 columns stand for?

If we were to load this csv file into a database, what kind of schema should we create? (Or does it even make sense to load this into a database in the first place?) I have read the https://arxiv.org/pdf/1804.01486.pdf multiple times but could not get any information on the structure of this pretrained csv file. Any help is greatly appreciated.

kaushikacharya · 2020-12-16T05:48:41Z

The columns are named v1, v2 ... v500. Where can we get information on what do these 500 columns stand for?

v1,...,v500 are the 500 dimensional vector embedding for the CUIs.

Quoting the paper from Section 4.1:

The 500-dimensional word2vec style embeddings using the combined data are referred to
as the cui2vec embeddings in all subsequent experiments.

Loading cui2vec:
You can use gensim as explained in piskvorky/gensim-data#25 (comment)

As a pre-requisite, you should read about word embeddings e.g. word2vec.
That will help you to understand vector embedding of text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lookup dictionary for pretrained embedding #3

Lookup dictionary for pretrained embedding #3

victorconan commented Oct 9, 2020

reality commented Nov 5, 2020

kaushikacharya commented Nov 6, 2020

KrishnaPG commented Dec 13, 2020

kaushikacharya commented Dec 16, 2020 •

edited

Loading

Lookup dictionary for pretrained embedding #3

Lookup dictionary for pretrained embedding #3

Comments

victorconan commented Oct 9, 2020

reality commented Nov 5, 2020

kaushikacharya commented Nov 6, 2020

KrishnaPG commented Dec 13, 2020

kaushikacharya commented Dec 16, 2020 • edited Loading

kaushikacharya commented Dec 16, 2020 •

edited

Loading