Back-end improvements to search results (eg. ranking, highlighting support, facet support) #1383

jmcmurry · 2016-10-25T19:10:33Z

Related to #1382, we need to cover scenarios originally mentioned #1317. We also need behave tests for each of them.

search issue	query	Top result must be	results must include	status
disease groups	Charcot-Marie	Charcot-Marie Tooth Disease (DOID:10595)		Fixed
inversions	Marie-Charcot	Charcot-Marie Tooth Disease (DOID:10595)		Fixed
hyphenations	Marie Charcot	Charcot-Marie		Fixed
exact match	Abnormality of the eye	Abnormality of the eye [phenotype]	Abnormality of the eyelid [phenotype]	Fixed
highlighting	Abnormality of the eye		Abnormality of the eyelid [phenotype]	Should be Abnormality of the eyelid not Abnormality of the eyelid
wildcard stemming	Pax		Pax6, Pax4, paxillin etc	[see #1313] working except for highlights
punctuation	15q13.3	chromosome 15q13.3 microdeletion syndrome		Fixed
Species query special processing	snca frog	snca Xenopus (NCBIGene:100038302)	SNCA human (?) see below	Fixed
Rank human genes first (if lexically equivalent)	LHX1	human lHX1(NCBIGene:3975)		Ranked 20th
Include gene name, not just symbol	LHX1	human lHX1(NCBIGene:3975)		Result blob for NCBIGene:3975 should include "LIM homeobox 1" not just "LHX1"

Results must be returned in a way that supports faceting across all of the results, not just those on the first page. We can't write a behave test as those counts are likely to change based on the data. Perhaps though we could say category results greater than X? @kltm curious what you think.
- Species (lay terms where possible (human, mouse, horse, cattle, chicken, fish, worm, etc)
- Search results type:
  - Disease
  - Disease group
  - Gene
  - Pathway
  - Gene group [species agnostic pending other improvements]
  - Model [pending indexing of genotypes]
  - Literature [pending indexing of literature]; not even sure about this one, feedback welcome
  - Case [eg. UDN pages etc]; again here, not sure this is a useful search facet. Not sure how often people would be searching for a specific case.

Because the search axes in the facets are disjoint by definition, don't worry about supporting checkbox-style or ternary queries unless it is dead easy. We can revisit.

cmungall · 2016-10-25T20:22:53Z

[moved to https://github.com/monarch-initiative/monarch-app/issues/1314#issuecomment-256715926]

btw, awesome grid, v useful to see this laid out.

mellybelly · 2016-10-25T20:49:56Z

agreed +1 on grid. If we can do all of the above that would be awesome! make it so! [moved to #1314 ]

yuanzhou · 2016-10-26T15:28:59Z

@jnguyenx Is there a way that I can get all search results without using the number of rows to be returned?

https://github.com/monarch-initiative/monarch-app/blob/master/lib/monarch/api.js#L1053

So basically I'm talking about the rows and start arguments, I know this is designed with paging in mind, but I'm not quite sure if paging with work out very well with filters since filters apply to the whole result set.

jnguyenx · 2016-10-26T17:02:56Z

@yuanzhou The line of code you pasted is for an owlsim search, you should rather use that for the search page.

Yes, filters will be computed on the whole result set. I think I know what bugs you:
The client will receive only part of the result set (e.g. the fist page to start with). When he'll click on a filter, another solr query will be done and will return a filtered result set. Same for sorting if we'll add it some day. You don't have to manage that on the UI side, sorl will do the work for you. You just need to display what it gives you and let the data drives that.

yuanzhou · 2016-10-26T18:16:39Z

@jnguyenx sorry that I pasted the wrong line number. I did use the one you suggested.

My concern is if we only display part of the result set per page because of using pagination, then applying filters will generate another new set of results instead of filtering the current data set. Because the filters are supposed to be used to narrow down the current data set instead of fetching new data set. In that case, the filters should be search options before we hit that "Go" button.

My thought is if we use species and category filters, then they should apply to current data set in one page (might be a long page, but that's why we use filters to shorten it) without pagination. This also means the search page should contain all search results once it's loaded. We are just filtering the results on client side.

@jmcmurry @harryhoch any thoughts?

jmcmurry · 2016-10-26T18:25:39Z

To separate concerns, please see my comment in 1383 about the desired user-facing behavior. The back end must accommodate dynamic count updates over the active set of results, be it filtered, or unfiltered

yuanzhou · 2016-10-26T18:26:27Z

Here is the screenshot of my work to make sure I explained my concern clearly.

In this case, the total number of results is 315. And the two filters apply to the whole data set. Imagine we are using pagination, once a user clicks on the filters, we fetch another data set (which the total number of matches must be smaller), this is considered as a new search with some criteria instead of filtering.

jmcmurry · 2016-10-26T19:47:05Z

The number of total search results (at the top) must change dynamically according to the filters
Selection of a facet in the species column must impact the counts in category facets, but not counts in species facets
A selection in category facets must impact counts in the species facets but not in category facets

pnrobinson · 2016-10-27T15:41:14Z

Could we use indenting to represent some of the semantics? i.e., Fanconi anemia etc is not indented, but fanca is?

I think we need to schedule a long skype or a F2F to discuss the strategies and what exactly we want the website to show -- count me in for it

yuanzhou · 2016-10-31T15:25:17Z

Do we consider missing taxon info of some search results as a data issue?

jmcmurry · 2016-10-31T16:06:11Z

Yes, it is a data issue; however one that is likely not to get fixed immediately as it is thorny. Moreover, what we build probably does need to account for missing taxon just in case future data snafus anyway.

jnguyenx · 2016-11-01T00:32:16Z

Some updates on the solr analyzers, now exact matches should be better ranked.

Concerning boosting the human taxon, I don't think that's reasonable to do that. The user should use filters or can type human or any synonyms in the search box. We should rely on solr raw scoring and not to try fancy things. Tweaking edge cases like this can break other edge cases etc...

For highlighting, this is coming from solr and it only highlight the whole token unfortunately, as far as I know we can't force it to only highlight matchings characters.

jmcmurry · 2016-11-01T17:28:53Z

hl.fragsize seems to be configurable, no?

jnguyenx · 2016-11-01T17:54:13Z

Yes it's configurable, but that's not what you're looking for in my opinion.

jmcmurry · 2016-11-01T17:57:26Z

As for human/mouse/fish boosting, is your concern that a) boosting will interfere with other ranking methods or b) boosting is unnecessary as filters get you there? If 'a' is your concern, is there a way that the boosting can be applied only within genes of equivalent symbols?

jnguyenx · 2016-11-01T18:06:07Z

It's actually both, in addition to b), you can also type human or any synonyms in the search box to limit the results.

I prefer to stick to solr per field boosting mechanism and rely only on user input, and not try to reorder ranking manually with tricks, that can go very wrong in my experience.

kshefchek · 2016-11-22T01:31:25Z

Is there a place where we could log queries which do not return the best results (semi subjectively)? For example:
https://beta.monarchinitiative.org/search/ALS
filter on diseases
ALS (Amyotrophic Lateral Sclerosis) does not appear until the 51st result.

kltm · 2016-11-22T01:36:40Z

@kshefchek Neat, and hard to pin down given the way that the tokenizers are working.
I believe that with the more complicated manager and schema layout in the pipeline (e.g. berkeleybop/bbop-manager-golr#4 ), there should be a better result (direct boosted match on ws tokenized field, less boosted matched on edgengram fields).

jnguyenx · 2016-11-22T01:42:22Z

I believe that this particular example is due to ALS being a synonym only.
I hacked around and put some boosts as prove of concepts, until the issue
that Seth mentioned is fixed.

https://github.com/monarch-initiative/monarch-app/blob/master/lib/monarch/api.js#L2788
You can try and play with the boosts locally.

On Mon, Nov 21, 2016 at 5:36 PM, kltm [email protected] wrote:

@kshefchek https://github.com/kshefchek Neat, and hard to pin down
given the way that the tokenizers are working.
I believe that with the more complicated manager and schema layout in the
pipeline (e.g. berkeleybop/bbop-manager-golr#4
berkeleybop/bbop-manager-golr#4 ), there
should be a better result (direct boosted match on ws tokenized field, less
boosted matched on edgengram fields).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/monarch-initiative/monarch-app/issues/1383#issuecomment-262122315,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEHMGAONTnRzvNi4wzFudEz-PcPnOEyoks5rAkcpgaJpZM4KgXpJ
.

kshefchek · 2016-12-02T19:12:47Z

@jmcmurry @jnguyenx any plans to enable search on our data-graph versus just the ontologies? Searching for genotypes and variants would be especially useful.

jnguyenx · 2016-12-02T19:19:46Z

I need to talk to @cmungall for this, he prevented me from doing it but we have never finished the discussion.

jnguyenx · 2017-04-26T15:59:20Z

We switched to the data graph earlier this month, it looks like it's working.

jmcmurry · 2018-09-10T17:00:48Z

broke out curie-specific concerns here https://github.com/monarch-initiative/monarch-app/issues/1625

jmcmurry assigned jnguyenx Oct 25, 2016

This was referenced Oct 25, 2016

High priority modifications to front-end of search results #1386

Open

Modifications to site search results Ranking, Representation, and Filtering #1317

Closed

jmcmurry added the search label Oct 25, 2016

jmcmurry mentioned this issue Oct 27, 2016

Improve display and navigation of disease groups in search and disease pages #1314

Open

3 tasks

jmcmurry mentioned this issue May 17, 2019

Accommodate search by curie / accession monarch-initiative/monarch-ui#85

Closed

kshefchek mentioned this issue Sep 25, 2019

replace 'Multicystic kidney dysplasia' as an autocomplete example monarch-initiative/monarch-ui#196

Closed

kshefchek mentioned this issue Sep 26, 2019

too many results in search monarch-initiative/monarch-ui#213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back-end improvements to search results (eg. ranking, highlighting support, facet support) #1383

Back-end improvements to search results (eg. ranking, highlighting support, facet support) #1383

jmcmurry commented Oct 25, 2016 •

edited by justaddcoffee

Loading

cmungall commented Oct 25, 2016 •

edited by jmcmurry

Loading

mellybelly commented Oct 25, 2016 •

edited by jmcmurry

Loading

yuanzhou commented Oct 26, 2016

jnguyenx commented Oct 26, 2016

yuanzhou commented Oct 26, 2016

jmcmurry commented Oct 26, 2016 •

edited

Loading

yuanzhou commented Oct 26, 2016 •

edited by jmcmurry

Loading

jmcmurry commented Oct 26, 2016

pnrobinson commented Oct 27, 2016 •

edited by jmcmurry

Loading

yuanzhou commented Oct 31, 2016

jmcmurry commented Oct 31, 2016

jnguyenx commented Nov 1, 2016

jmcmurry commented Nov 1, 2016 •

edited

Loading

jnguyenx commented Nov 1, 2016

jmcmurry commented Nov 1, 2016

jnguyenx commented Nov 1, 2016

kshefchek commented Nov 22, 2016

kltm commented Nov 22, 2016

jnguyenx commented Nov 22, 2016

kshefchek commented Dec 2, 2016

jnguyenx commented Dec 2, 2016

jnguyenx commented Apr 26, 2017 •

edited

Loading

jmcmurry commented Sep 10, 2018

Back-end improvements to search results (eg. ranking, highlighting support, facet support) #1383

Back-end improvements to search results (eg. ranking, highlighting support, facet support) #1383

Comments

jmcmurry commented Oct 25, 2016 • edited by justaddcoffee Loading

cmungall commented Oct 25, 2016 • edited by jmcmurry Loading

mellybelly commented Oct 25, 2016 • edited by jmcmurry Loading

yuanzhou commented Oct 26, 2016

jnguyenx commented Oct 26, 2016

yuanzhou commented Oct 26, 2016

jmcmurry commented Oct 26, 2016 • edited Loading

yuanzhou commented Oct 26, 2016 • edited by jmcmurry Loading

jmcmurry commented Oct 26, 2016

pnrobinson commented Oct 27, 2016 • edited by jmcmurry Loading

yuanzhou commented Oct 31, 2016

jmcmurry commented Oct 31, 2016

jnguyenx commented Nov 1, 2016

jmcmurry commented Nov 1, 2016 • edited Loading

jnguyenx commented Nov 1, 2016

jmcmurry commented Nov 1, 2016

jnguyenx commented Nov 1, 2016

kshefchek commented Nov 22, 2016

kltm commented Nov 22, 2016

jnguyenx commented Nov 22, 2016

kshefchek commented Dec 2, 2016

jnguyenx commented Dec 2, 2016

jnguyenx commented Apr 26, 2017 • edited Loading

jmcmurry commented Sep 10, 2018

jmcmurry commented Oct 25, 2016 •

edited by justaddcoffee

Loading

cmungall commented Oct 25, 2016 •

edited by jmcmurry

Loading

mellybelly commented Oct 25, 2016 •

edited by jmcmurry

Loading

jmcmurry commented Oct 26, 2016 •

edited

Loading

yuanzhou commented Oct 26, 2016 •

edited by jmcmurry

Loading

pnrobinson commented Oct 27, 2016 •

edited by jmcmurry

Loading

jmcmurry commented Nov 1, 2016 •

edited

Loading

jnguyenx commented Apr 26, 2017 •

edited

Loading