Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve phenotype EFO mappings #30

Open
apriltuesday opened this issue Dec 6, 2023 · 5 comments
Open

Improve phenotype EFO mappings #30

apriltuesday opened this issue Dec 6, 2023 · 5 comments

Comments

@apriltuesday
Copy link
Collaborator

Refer to opentargets/issues#3149 for context. Tasks on our side:

  • Look at OnToma’s recipes for automated mappings - documentation
  • Try using exact match search using OLS
  • Possibly set up meeting with SPOT team to discuss how to best use ZOOMA
@apriltuesday
Copy link
Collaborator Author

Note that Zooma's exact ontology matches might not come back as high confidence with our query, modifying this might help us get back more automated mappings for PGKB and perhaps also ClinVar. See Zooma documentation here.

@apriltuesday
Copy link
Collaborator Author

Ran on the same dataset as the 23.12 submission, using the mentioned PR and the other recent changes.

Total clinical annotations: 5073
        With RS: 4477 (88.25%)
                1. Exploded by allele: 13497 (3.0x)
                2. Exploded by PGx category: 13798 (1.0x)
                3. Exploded by drug: 19238 (1.4x)
                4. Exploded by phenotype: 23576 (1.2x)
Total evidence strings: 25963
        With CHEBI: 21668 (83.46%)
        With EFO phenotype: 10938 (42.13%)
        With functional consequence: 23842 (91.83%)
        With VEP gene: 23842 (91.83%)
Gene comparisons per annotation
        With PGKB genes: 4220 (83.19%)
        With VEP genes: 4097 (80.76%)
        PGKB genes != VEP genes: 772 (15.22%)
Total RS: 2794
        With parsed alleles: 2771 (99.18%)
                With >2 alleles: 31 (1.12%)

EFO coverage is better (33% -> 42%) but still not amazing, though the cystic fibrosis term highlighted in the OT issue is fixed.

I've dumped unmapped phenotype terms in a spreadsheet here. Perhaps we can look at synonyms or terms provided by PGKB but I'm also wondering whether some of these super generic terms are in evidence being filtered out by OT anyway... e.g. "adverse events".

@apriltuesday
Copy link
Collaborator Author

cc @M-casado @tcezard

@apriltuesday
Copy link
Collaborator Author

With the explicit OLS check added, we bump up to 48.57%:

...
Total evidence strings: 25963
        With CHEBI: 21668 (83.46%)
        With EFO phenotype: 12610 (48.57%)
        With functional consequence: 23842 (91.83%)
        With VEP gene: 23842 (91.83%)
...

I've updated the list of unmapped terms as well. As Tim pointed out in the meeting, many of the more generic terms seem to occur in combination with other phenotypes in which context they might make more sense - e.g. for adverse events.

@apriltuesday
Copy link
Collaborator Author

Also cc @tskir, in case you are interested in the unmapped terms in particular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant