Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crummy Enrichments #99

Open
cbizon opened this issue Jan 12, 2022 · 3 comments
Open

Crummy Enrichments #99

cbizon opened this issue Jan 12, 2022 · 3 comments
Assignees

Comments

@cbizon
Copy link
Contributor

cbizon commented Jan 12, 2022

https://arax.ncats.io/?source=ARS&id=e2800952-aae8-4605-97ce-4cfbc596934e

The query is https://github.com/NCATSTranslator/testing/blob/main/ars-requests/not-none/1.2/risk.json

There are numerous results I don't like. Like "disease" and "blood".

Also systematically it's preferring gene answers to chemical answers. Is that ok? Maybe.

Also, the first hits things that are near-synonyms with the input. This isn't wrong, it's right, but it's not terribly helpful.

@cbizon cbizon self-assigned this Jan 12, 2022
@cbizon
Copy link
Contributor Author

cbizon commented Jan 12, 2022

Interestingly this query: https://arax.ncats.io/?source=ARS&id=bf04c388-b4d2-482e-9ddc-abb92c6c81c8

which is the same, but uses "ChemicalEntity" produces much nicer results. I think it's because the original NamedThing sets the denominator of the enrichment to something giant. So even things like disease get linked in. Maybe we need some kind of dynamic denominator

@cbizon
Copy link
Contributor Author

cbizon commented Feb 15, 2022

A similar issue can happen with e.g. chemicals. CHEBI is a subset of chemicals, but it has subclasses in it. If you use "all chemicals" as the denominator size, then if you have more chebis that randomly expected (which is reasonable given that chebi contains the 'most interesting' or at least most annotated chemicals), then it will look like you've chosen a meaningful set of chemicals because they're all descended from some high-level chemical class.

@cbizon
Copy link
Contributor Author

cbizon commented Feb 15, 2022

I'm probably overthinking some of this. Our edges are based on what's in our local graph. So the denominators should be based on that, and we should just ignore edges that don't occur in that graph. There are perhaps other approaches but this is the most straightforward. So the main thing to do is first remove any answers that don't occur in our local graph.

@cbizon cbizon mentioned this issue Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant