Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginating through hits is a problem when hits are broken up by other tags. #527

Open
jan-niestadt opened this issue Aug 26, 2024 · 1 comment
Labels

Comments

@jan-niestadt
Copy link
Member

If a hit e.g. begins in one sentence and ends in another, BLS' /docs/DOCID/contents will highlight it in two parts, and the frontend will think those are two separate hits.

For example, if we highlight a hit spanning from pancakes to They're (inclusive), BLS will return:

<s>I like <hl>pancakes.</hl></s>
<s><hl>They're</hl> delicious.</s>

So the individual hits cannot always be found from BLS' highlighted document contents. But the frontend's pagination uses the <hl> tags as if they represent individual hits, which breaks in this case, showing two parts of a hit as if there's two separate hits.

Another problem is overlapping hits.

Both BLS and frontend need to be changed to address these problems.
BLS should add a hit index attribute to any <hl> tag, so the frontend know the two tags belong together, e.g.:

<s>I like <hl n="1">pancakes.</hl></s>
<s><hl n="1">They're</hl> delicious.</s>

For two overlapping hits "The fox jumps" and "jumps over the dog":

<s><hl n="1">The fox <hl n="2">jumps</hl></hl><hl n="2"> over the dog.</hl></s>

The frontend would then use these indexes to identify and highlight whole hits at a time.

@KCMertens
Copy link
Member

Also adding the hit start as an attribute would be useful. There are instances where we need to jump to a specific hit without necessarily knowing its index. For example when opening the document through the expanded snippet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants