-
Notifications
You must be signed in to change notification settings - Fork 29
replace 'Multicystic kidney dysplasia' as an autocomplete example #196
Comments
Also related: https://github.com/monarch-initiative/monarch-app/issues/1534 |
Reviewing my solr notes - the query "Multicystic kidney dysplasia" searches our documents for four tokens:
This typically results in the correct hit being ranked highly, but we also end up with many false positives. We see this especially for diseases with syndrome in the name or other common words. We can experiment with settings, but I think @jmcmurry and Jeremy did a lot of testing and decided on the current configuration. |
I would say that if we have an exact match with a phrase like "Multicystic kidney dysplasia" then we should only display that (and exact synonyms). It is different if the user is in the process of entering a phrase. It basically just feels like a mistake to me -- can we reconsider or re-discuss this? |
I'm inclined to agree that in this edge case with such a specific three
token query, just an exact match could be returned (with the tokens in any
order, but all exactly matched), perhaps with a button to see other results?
I don't feel super strongly
… |
is the confusion that theres a phenotype and disease with nearly the same label? Or that theres too many results that are unrelated (or both). |
@pnrobinson - there is a question for you in this ticket. Ping! |
The list of suggestions is weird. Whether or not we want to fix this feature right now, "Multicystic kidney dysplasia" should not be shown as one of our examples on the landing page. "Noonan syndrome" works a lot better, for instance. |
A potential solution here is to only search on the entire string when a user limits a query to phenotypes, and potentially other categories. This would avoid the matches where only "kidney" has matched. Rereading https://github.com/monarch-initiative/monarch-app/issues/1383, I assume the reason we do this is to support queries like "{taxon} {gene_symbol}" for example Human SHH. Sending "Human" and "SHH" as distinct tokens allows us to match different fields in the solr doc, in this case the primary label and the taxon_label fields.
I think this could get tricky since the solr score is calculated on an entire document, rather than specific fields. |
The minimum should match parameter is another option here: https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter |
Testing this out on https://kshefchek.github.io/monarch-ui/, even when requiring 2 out of 3 in [Multicystic, kidney, dysplasia] we still get these extra hits, I think the best approach is to use a different example |
@pnrobinson do you have an idea for a better autocomplete example, or is 'Multicystic kidney dysplasia' working well enough? |
@kshefchek My main wish would be that the site shows at least the children of the disease terms. E.g., https://monarchinitiative.org/disease/MONDO:0015231 |
it does! but it's hidden in neighbors (under overview in the vertical nav bar), but we should make this a new ticket or discussion (or use this one) For this ticket, do the results for the autocomplete example 'Multicystic kidney dysplasia' look better? |
I see what you mean. |
perfect thanks! |
I find the autocomplete for Multicystic kidney dysplasia to be a little confusing. Too many different diseases get shown.
The text was updated successfully, but these errors were encountered: