General search results rankings (p53 as example) #102

cmungall · 2014-05-02T21:30:38Z

Typing "p53" (no space) in the search box has this gene showing up first:

http://amigo2.berkeleybop.org/cgi-bin/amigo2/amigo/gene_product/MGI:MGI:2146005

Due to the fact the full name reflects the function of p53 binding.

Adding the space gives better results (we have to fix this I'm afraid).. but the mouse p53 is nowhere to be found.

Even with the MGI filter turned on with this search
http://amigo2.berkeleybop.org/cgi-bin/amigo2/amigo/search/bioentity?q=p53

It's hard to find - I had to go via the panther family, eventually I got it:
http://amigo2.berkeleybop.org/cgi-bin/amigo2/amigo/gene_product/MGI:MGI:98834

It does have p53 as the synonym...

Let's work with others on fixing this.

kltm · 2014-05-02T21:47:55Z

For the middle (bioentity live search) case, considering that synonyms are apparently not in there at all, it's doing a pretty good job. I'll add synonyms to the boost config at 1.0 for starters.

Current bio-config.yaml:
boost_weights: bioentity^2.0 bioentity_label^2.0 bioentity_name^1.0 bioentity_internal_id^1.0 isa_partof_closure_label^1.0 regulates_closure^1.0 regulates_closure_label^1.0 panther_family^1.0 panther_family_label^1.0 taxon_closure_label^1.0

For the other two, it's a matter of structuring the general search better. Currently, there are three categories: entity (id), entity_label, and a big ball of "stuff"--synonyms are in "stuff", along with everything else. This is done to allow a unioned search across all of the various doc types. We can boost the synonym results either by making another top-level like "important_stuff" that gets weighed higher, or by making synonyms more prevalent in the stuff, by repeating them or something.

kltm · 2014-05-02T21:57:29Z

Either way, will need to actually change fields or field types in the loader, so we can explore it when we climb into the Java again for 2.2.

monicacecilia · 2014-05-02T22:43:28Z

Oh wow, yes that looks like a mess of search results. -- There is also similarly "funny" behavior when annotators are using the "GO ID" tool on the "Information Editor" in Web Apollo. As I understand, that tool connects to AmiGO -- but I don't know the details of that connection. "This is another story and shall be told another time" (M. Ende). What is relevant to say is that fixing this search will also improve experiences outside AmiGO.

From Seth: "We can boost the synonym results either by making another top-level like "important_stuff" that gets weighed higher, or by making synonyms more prevalent in the stuff, by repeating them or something."

Likely better to create the bag of "important stuff", than repeating the synonyms.

kltm · 2014-05-02T22:51:15Z

Hm. A new "important stuff" bag gets one to consider how important the stuff is; maybe we need and "importanter stuff" bag too? That could get very silly pretty fast. OTOH, gaming the schema at too low a level can get fiddly.

Also to consider are issues like #24 and how they would relate to a general schema. We're going to need to extend it a little no matter what it seems. Perhaps I'll change the item to something like: re-engineer the general schema, with a list of things we want out of it.

kltm · 2014-09-30T22:28:03Z

We've had a similar discussion with @rbalakri about the results with "proximal" and the GO--currently when searching for "proximal", many non-GO terms take priority, which may confuse some users expectations. (E.g. "proximal rib", etc.)

After a little discussion with @cmungall about things that might be done to improve that, one possibility that we might look at is adding a field to the general search schema (maybe document_relevance_category) that would be strings like "core ontology", "peripheral bioentity"; we could then tweak the search to give greater preference to "core" entities or add a collapsible radio button set under the box that allowed you to goose the search for ontology terms, etc.

Essentially searching and giving preference to relevance tagging done during the load stage. While this would require some playing with the loader, I feel that this happens in enough of a transparent way that it might be the way forward.

rbalakri · 2014-09-30T22:30:02Z

I like this idea.
Can we talk about this at Barcelona?

Rama

kltm · 2014-09-30T22:33:47Z

We can, but this is already scheduled for 2.3, so we'll likely be getting to it post-meeting at some point anyways.

kltm · 2014-11-03T23:39:35Z

This is related to berkeleybop/bbop-js#16.

kltm · 2015-09-01T14:22:05Z

Answering @cmungall on #239.

Ideally human would be first followed by MODs. This could be a configuration, or alternatively scoring each gene by number of experimental annotations would be a nice generic way to do it. This would be an easy field for @hdietze to add when loading.

What this would boil down to would be two new fields, say: search_bin_priority_one and search_bin_priority_two. Human genes would populate the first one, MODs get the second, everybody else gets none.

The search would then be boosted on those two fields, say: search_bin_priority_one^4.0 search_bin_priority_two^2.0.

kltm · 2015-10-25T21:52:25Z

As another case, from http://jira.geneontology.org/browse/GO-1007, it would be nice to have tokenizing more sensitive to common use cases like let-23, where a user might be surprised by the fact that the tokenizer defaults to breaking on the hyphen.

doughowe · 2016-04-15T14:22:38Z

From the Noctua session at the Geneva GO meeting..Seth suggested I post this here:

At ZFIN, for autocompletion in term entry boxes, we use a model that allows "starts with" searching for multiple words. This saves many key strokes.
Example:
Entering "trans fac pol"
would find all the terms with the terms including words that match all three:
"trans_"
"fac_"
"pol*"

like "transcription bla bla factor bla bla bla polymerase bla bla bla"

We really like that mechanism for term searching in ZFIN...food for thought.

cmungall · 2016-04-15T16:20:38Z

ooh, I like this. @doughow Is this on user-facing autocompletes as well as curation? I can see this as being massively useful for biocurators (although with lego you tend to go for the subset of classes with fewer words, but not always). I don't have a strong sense of whether the average non-power user would do this much

… to cause a failure in searching similar to geneontology/amigo#102

kltm · 2017-10-13T20:21:18Z

See @ValWood transport example on #447

ValWood · 2017-10-13T20:36:33Z

If you are using lucene we have fine tuned our search over many iterations. We always find what we type, pretty much. @kimrutherford can point you to our weighting.

It might do what Doug describes above too. I'm not sure but it seems to work well for us. I think it even handles typos....

kltm · 2017-10-13T20:47:14Z

Thank you--more input is always appreciated. That said, we already understand why we have this problem and have implemented an experimental tokenizing/parser fix that solves it (berkeleybop/bbop-manager-golr#4). The issue that we currently have is to rollout the solution and update the software to make use of it.

kimrutherford · 2017-10-13T21:30:18Z

have implemented an experimental tokenizing/parser fix that solves it (berkeleybop/bbop-manager-golr#4).

That issue mentions EdgeNGramTokenizer, which is what we're using at PomBase.

It might do what Doug describes above too.

We're doing more or less as Doug describes as well as allowing minor typos. We currently index only the names and synonyms. The synonyms get a lower weighting when we query.

doughowe · 2017-10-18T00:01:57Z

Loooooooong ago @cmungall asked if we use our "multi-word begins with" search mechanism for curators only, or if it is also public facing. I believe it is only for curators. I'm not sure how intuitive or natural it would be for general database users. If you know about it, it whittles down long autosuggest lists quickly, particularly for those pesky long terms you know the name of...sort of.

Actually..I just tried it in our single box search at ZFIN.org and it seems to work there, so that is public facing. Its not hurting anything, and is helpful if you know about it.

kltm changed the title ~~p53 search result ranking~~ Search result rankings (p53 as example) May 2, 2014

kltm changed the title ~~Search result rankings (p53 as example)~~ Search results rankings (p53 as example) May 2, 2014

kltm added enhancement labels May 2, 2014

kltm mentioned this issue May 2, 2014

Synonyms ignored in gene product searches #103

Closed

kltm added this to the 2.2 milestone May 2, 2014

kltm modified the milestones: 2.3, 2.2 May 5, 2014

kltm changed the title ~~Search results rankings (p53 as example)~~ General search results rankings (p53 as example) Sep 30, 2014

kltm modified the milestones: 2.3, 2.4 Aug 14, 2015

kltm mentioned this issue Sep 1, 2015

Boost human followed by MODs in search prioritization #239

Closed

kltm modified the milestones: 2.4, 2.5 Mar 2, 2016

kltm mentioned this issue Jul 26, 2016

Modifications to site search results Ranking, Representation, and Filtering monarch-initiative/monarch-legacy#1317

Closed

11 tasks

kltm added a commit to berkeleybop/bbop-manager-golr that referenced this issue Aug 19, 2016

assuming a functional golr (3.x) on jetty somewhere, load enough data…

28b86e0

… to cause a failure in searching similar to geneontology/amigo#102

kltm mentioned this issue Dec 7, 2016

search behaviour #410

Closed

kltm mentioned this issue Jan 30, 2017

search weights PATO terms higher than GO terms #420

Closed

kltm mentioned this issue Oct 13, 2017

amigo search weighting for exact matches #447

Closed

kltm mentioned this issue Nov 22, 2017

Finicky autocomplete behavior: Possible to autocomplete on partial matches? geneontology/noctua#525

Closed

kltm mentioned this issue Oct 22, 2024

Question: can't locate complexes specified in our GPI geneontology/noctua#910

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General search results rankings (p53 as example) #102

General search results rankings (p53 as example) #102

cmungall commented May 2, 2014

kltm commented May 2, 2014

kltm commented May 2, 2014

monicacecilia commented May 2, 2014

kltm commented May 2, 2014

kltm commented Sep 30, 2014

rbalakri commented Sep 30, 2014

kltm commented Sep 30, 2014

kltm commented Nov 3, 2014

kltm commented Sep 1, 2015

kltm commented Oct 25, 2015

doughowe commented Apr 15, 2016

cmungall commented Apr 15, 2016

kltm commented Oct 13, 2017

ValWood commented Oct 13, 2017

kltm commented Oct 13, 2017

kimrutherford commented Oct 13, 2017

doughowe commented Oct 18, 2017 •

edited

Loading

General search results rankings (p53 as example) #102

General search results rankings (p53 as example) #102

Comments

cmungall commented May 2, 2014

kltm commented May 2, 2014

kltm commented May 2, 2014

monicacecilia commented May 2, 2014

kltm commented May 2, 2014

kltm commented Sep 30, 2014

rbalakri commented Sep 30, 2014

kltm commented Sep 30, 2014

kltm commented Nov 3, 2014

kltm commented Sep 1, 2015

kltm commented Oct 25, 2015

doughowe commented Apr 15, 2016

cmungall commented Apr 15, 2016

kltm commented Oct 13, 2017

ValWood commented Oct 13, 2017

kltm commented Oct 13, 2017

kimrutherford commented Oct 13, 2017

doughowe commented Oct 18, 2017 • edited Loading

doughowe commented Oct 18, 2017 •

edited

Loading