-
Notifications
You must be signed in to change notification settings - Fork 37
Back-end improvements to search results (eg. ranking, highlighting support, facet support) #1383
Comments
[moved to https://github.com/monarch-initiative/monarch-app/issues/1314#issuecomment-256715926] btw, awesome grid, v useful to see this laid out. |
agreed +1 on grid. If we can do all of the above that would be awesome! make it so! [moved to #1314 ] |
@jnguyenx Is there a way that I can get all search results without using the number of rows to be returned? https://github.com/monarch-initiative/monarch-app/blob/master/lib/monarch/api.js#L1053 So basically I'm talking about the |
@yuanzhou The line of code you pasted is for an owlsim search, you should rather use that for the search page. Yes, filters will be computed on the whole result set. I think I know what bugs you: |
@jnguyenx sorry that I pasted the wrong line number. I did use the one you suggested. My concern is if we only display part of the result set per page because of using pagination, then applying filters will generate another new set of results instead of filtering the current data set. Because the filters are supposed to be used to narrow down the current data set instead of fetching new data set. In that case, the filters should be search options before we hit that "Go" button. My thought is if we use species and category filters, then they should apply to current data set in one page (might be a long page, but that's why we use filters to shorten it) without pagination. This also means the search page should contain all search results once it's loaded. We are just filtering the results on client side. @jmcmurry @harryhoch any thoughts? |
To separate concerns, please see my comment in 1383 about the desired user-facing behavior. The back end must accommodate dynamic count updates over the active set of results, be it filtered, or unfiltered |
Here is the screenshot of my work to make sure I explained my concern clearly. In this case, the total number of results is 315. And the two filters apply to the whole data set. Imagine we are using pagination, once a user clicks on the filters, we fetch another data set (which the total number of matches must be smaller), this is considered as a new search with some criteria instead of filtering. |
|
Could we use indenting to represent some of the semantics? i.e., Fanconi anemia etc is not indented, but fanca is? I think we need to schedule a long skype or a F2F to discuss the strategies and what exactly we want the website to show -- count me in for it |
Do we consider missing taxon info of some search results as a data issue? |
Yes, it is a data issue; however one that is likely not to get fixed immediately as it is thorny. Moreover, what we build probably does need to account for missing taxon just in case future data snafus anyway. |
Some updates on the solr analyzers, now exact matches should be better ranked. Concerning boosting the human taxon, I don't think that's reasonable to do that. The user should use filters or can type For highlighting, this is coming from solr and it only highlight the whole token unfortunately, as far as I know we can't force it to only highlight matchings characters. |
hl.fragsize seems to be configurable, no? |
Yes it's configurable, but that's not what you're looking for in my opinion. |
As for human/mouse/fish boosting, is your concern that a) boosting will interfere with other ranking methods or b) boosting is unnecessary as filters get you there? If 'a' is your concern, is there a way that the boosting can be applied only within genes of equivalent symbols? |
It's actually both, in addition to b), you can also type human or any synonyms in the search box to limit the results. I prefer to stick to solr per field boosting mechanism and rely only on user input, and not try to reorder ranking manually with tricks, that can go very wrong in my experience. |
Is there a place where we could log queries which do not return the best results (semi subjectively)? For example: |
@kshefchek Neat, and hard to pin down given the way that the tokenizers are working. |
I believe that this particular example is due to ALS being a synonym only. https://github.com/monarch-initiative/monarch-app/blob/master/lib/monarch/api.js#L2788 On Mon, Nov 21, 2016 at 5:36 PM, kltm [email protected] wrote:
|
I need to talk to @cmungall for this, he prevented me from doing it but we have never finished the discussion. |
We switched to the data graph earlier this month, it looks like it's working. |
broke out curie-specific concerns here https://github.com/monarch-initiative/monarch-app/issues/1625 |
Related to #1382, we need to cover scenarios originally mentioned #1317. We also need behave tests for each of them.
Because the search axes in the facets are disjoint by definition, don't worry about supporting checkbox-style or ternary queries unless it is dead easy. We can revisit.
The text was updated successfully, but these errors were encountered: