Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent processing of duplicate parts #4

Open
AndreasGassmann opened this issue Mar 15, 2021 · 8 comments
Open

Prevent processing of duplicate parts #4

AndreasGassmann opened this issue Mar 15, 2021 · 8 comments

Comments

@AndreasGassmann
Copy link
Contributor

When the same part is "received" multiple times, this library will process each of the requests. This will result in the progress being higher than it should be. I now catch this outside the library, but maybe it would make sense to include it (basically just ignoring duplicates).

@xardass
Copy link
Contributor

xardass commented Jun 23, 2021

Thanks, we are looking in this

@antonispoulakis
Copy link
Collaborator

Hey @AndreasGassmann how are you trying to get the progress?

The library ignores the duplicate parts but keeps track of every processed part, even if duplicate.

There are 2 functions you can use to get a report on progress: estimatedPercentComplete and getProgress. The accurate one is the getProgress() one, as it uses the unique parts divided by the expected parts, so it returns the accurate value.

The estimatedPercentComplete will always keep incrementing as long as the user keeps scanning, so it can maybe lead to better UX, as the getProgress will probably remain static if a few QR codes are missed and the sequence is long enough.

My guess is that you're using estimatedPercentComplete but expecting the result that getProgress would give.

@AndreasGassmann
Copy link
Contributor Author

The library ignores the duplicate parts but keeps track of every processed part, even if duplicate.

I'm assuming you are keeping track of every processed part to estimate the completion. But shouldn't duplicates on the "page" level be ignored? So if you scan ur:bytes/1-5/... 20 times, shouldn't it be ignored after being processed once?

The estimatedPercentComplete will always keep incrementing as long as the user keeps scanning, so it can maybe lead to better UX, as the getProgress will probably remain static if a few QR codes are missed and the sequence is long enough.

This was the reason why we ended up using estimatedPercentComplete over getProgress.

I understand that estimatedPercentComplete is not really "accurate", but without ignoring identical QRs, the progress can be WAY higher than I expect it to be. I don't remember exactly anmore, but during testing I displayed each QR for like 1 second and the scanner was scanning more than 10 times per second. Within a second, the estimatedPercentComplete was at 100% because of it. If duplicates are ignored (which we do outside the library now), the estimation works quite well.

It's not really an issue for us because it can easily be handled, but I feel like it's something other people might run into as well.

@antonispoulakis
Copy link
Collaborator

The library ignores the duplicate parts but keeps track of every processed part, even if duplicate.

I'm assuming you are keeping track of every processed part to estimate the completion. But shouldn't duplicates on the "page" level be ignored? So if you scan ur:bytes/1-5/... 20 times, shouldn't it be ignored after being processed once?

The "keeping track" part is just a number that gets incremented every time a part is received. That doesn't mean that it's duplicated internally.

The estimatedPercentComplete will always keep incrementing as long as the user keeps scanning, so it can maybe lead to better UX, as the getProgress will probably remain static if a few QR codes are missed and the sequence is long enough.

This was the reason why we ended up using estimatedPercentComplete over getProgress.

I understand that estimatedPercentComplete is not really "accurate", but without ignoring identical QRs, the progress can be WAY higher than I expect it to be. I don't remember exactly anmore, but during testing I displayed each QR for like 1 second and the scanner was scanning more than 10 times per second. Within a second, the estimatedPercentComplete was at 100% because of it.

In my opinion, estimatedPercentComplete is not that useful. The main reason I've kept it is for compatibility with the C++ implementation, in case someone is using both.

If duplicates are ignored (which we do outside the library now), the estimation works quite well.

It's not really an issue for us because it can easily be handled, but I feel like it's something other people might run into as well.

Can you elaborate a bit on the way you're handling it? From what I understand, if you're ignoring the duplicate parts, you should be getting the same values as getProgress().
Are you keeping track of the 1-5 part and only incrementing the progress on every new index? If so, how do you handle the parts that are mixed (6-5, 7-5, etc.)?

@AndreasGassmann
Copy link
Contributor Author

We match the whole string we scan, something like this:

// const qrs = new Set()
// const qr: string = 'ur:bytes/1-10/...'

if (!qrs.has(qr)) {
  ur.receive(qr)
  qrs.add(qr)
}

@antonispoulakis
Copy link
Collaborator

Are you getting the progress by doing something like this?

// from the `ur:bytes/1-10/...` you can get that
const expectedLength = 10
const progress = qrs.size() / expectedLength

This will exactly match the result of getProgress(), at least for the first 10 qr codes.

If you're using something like the snippet above, you can also end up overshooting if the scanning device randomly misses parts and ends up having to scan the mixed parts (11-10...), which can end up needing more than 10 total scans.

The way I've handled that in an app is by doing:

ur.receivePart(data);

// it's complete, parse data, use them, stop scanner
if (ur.isComplete()) {}

// update progress
setProgress(urDecoder.getProgress());

@AndreasGassmann
Copy link
Contributor Author

AndreasGassmann commented Jun 28, 2021

Sorry if I was unclear, but we use estimatedPercentComplete(). We don't do any progress calculation ourselves. The only issue I wanted to bring up here is the following:

Assuming we have a payload that will result in 2 UR parts:

ur:bytes/1-2/aaaa
ur:bytes/2-2/bbbb
ur:bytes/3-2/cccc
...

If those are displayed 1 second each, but the scanner scans 5 times per second, it will register some of them multiple times.

// QR scanner
ur:bytes/1-2/aaaa (scanned after 0s)
ur:bytes/1-2/aaaa (scanned after 0.2s)
ur:bytes/1-2/aaaa (scanned after 0.4s)
ur:bytes/1-2/aaaa (scanned after 0.6s)
ur:bytes/1-2/aaaa (scanned after 0.8s)
ur:bytes/2-2/bbbb (scanned after 1s)
ur:bytes/2-2/bbbb (scanned after 1.2s)
ur:bytes/2-2/bbbb (scanned after 1.4s)
ur:bytes/2-2/bbbb (scanned after 1.6s)
ur:bytes/2-2/bbbb (scanned after 1.8s)
...

If I call ur.receive() with every string that the scanner returns (which potentially gives us the exact same string multiple times), then estimatedPercentComplete() will go towards 100% very quickly. If I simply keep track of all the strings that we scanned and don't "receive" them a second time (see snippet above), then everything works as expected. So my only suggestion was to do this in the library itself. Basically if the exact string was handled before, then just ignore it in the future.

@antonispoulakis
Copy link
Collaborator

If I now call ur.receive() with every string that the scanner returns (which potentially give us the exact same string multiple times), then estimatedPercentComplete() will go towards 100% very quickly.

That is the expected result of estimatedPercentComplete. It will return the number of times that receive has been called.

If I simply keep track of all the strings that we scanned (see snippet above), then everything works as expected. So my only suggestion was to do this in the library itself. Basically if this exact string was handled before, then just ignore it.

You can see here that if the fragment/part already exists, it will ignore it, so only unique parts are saved. There's no need to store and compare raw strings.

processSimplePart(): {
    if (this.receivedPartIndexes.includes(fragmentIndex)) {
      return;
    }

    this.simpleParts.push({ key: part.indexes, value: part });
    this.receivedPartIndexes.push(fragmentIndex);
   ...
}

The getProcess function returns these unique received indexes.

So, you can safely call ur.receive on every new scan, not store/compare the raw strings and use getProgress instead of estimatedPercentComplete. Have you tried using getProgress without the string comparisons?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants