Prevent processing of duplicate parts #4

AndreasGassmann · 2021-03-15T17:38:56Z

When the same part is "received" multiple times, this library will process each of the requests. This will result in the progress being higher than it should be. I now catch this outside the library, but maybe it would make sense to include it (basically just ignoring duplicates).

xardass · 2021-06-23T19:48:06Z

Thanks, we are looking in this

antonispoulakis · 2021-06-26T19:55:19Z

Hey @AndreasGassmann how are you trying to get the progress?

The library ignores the duplicate parts but keeps track of every processed part, even if duplicate.

There are 2 functions you can use to get a report on progress: estimatedPercentComplete and getProgress. The accurate one is the getProgress() one, as it uses the unique parts divided by the expected parts, so it returns the accurate value.

The estimatedPercentComplete will always keep incrementing as long as the user keeps scanning, so it can maybe lead to better UX, as the getProgress will probably remain static if a few QR codes are missed and the sequence is long enough.

My guess is that you're using estimatedPercentComplete but expecting the result that getProgress would give.

AndreasGassmann · 2021-06-28T08:53:18Z

The library ignores the duplicate parts but keeps track of every processed part, even if duplicate.

I'm assuming you are keeping track of every processed part to estimate the completion. But shouldn't duplicates on the "page" level be ignored? So if you scan ur:bytes/1-5/... 20 times, shouldn't it be ignored after being processed once?

The estimatedPercentComplete will always keep incrementing as long as the user keeps scanning, so it can maybe lead to better UX, as the getProgress will probably remain static if a few QR codes are missed and the sequence is long enough.

This was the reason why we ended up using estimatedPercentComplete over getProgress.

I understand that estimatedPercentComplete is not really "accurate", but without ignoring identical QRs, the progress can be WAY higher than I expect it to be. I don't remember exactly anmore, but during testing I displayed each QR for like 1 second and the scanner was scanning more than 10 times per second. Within a second, the estimatedPercentComplete was at 100% because of it. If duplicates are ignored (which we do outside the library now), the estimation works quite well.

It's not really an issue for us because it can easily be handled, but I feel like it's something other people might run into as well.

antonispoulakis · 2021-06-28T11:20:36Z

The library ignores the duplicate parts but keeps track of every processed part, even if duplicate.

I'm assuming you are keeping track of every processed part to estimate the completion. But shouldn't duplicates on the "page" level be ignored? So if you scan ur:bytes/1-5/... 20 times, shouldn't it be ignored after being processed once?

The "keeping track" part is just a number that gets incremented every time a part is received. That doesn't mean that it's duplicated internally.

The estimatedPercentComplete will always keep incrementing as long as the user keeps scanning, so it can maybe lead to better UX, as the getProgress will probably remain static if a few QR codes are missed and the sequence is long enough.

This was the reason why we ended up using estimatedPercentComplete over getProgress.

I understand that estimatedPercentComplete is not really "accurate", but without ignoring identical QRs, the progress can be WAY higher than I expect it to be. I don't remember exactly anmore, but during testing I displayed each QR for like 1 second and the scanner was scanning more than 10 times per second. Within a second, the estimatedPercentComplete was at 100% because of it.

In my opinion, estimatedPercentComplete is not that useful. The main reason I've kept it is for compatibility with the C++ implementation, in case someone is using both.

If duplicates are ignored (which we do outside the library now), the estimation works quite well.

It's not really an issue for us because it can easily be handled, but I feel like it's something other people might run into as well.

Can you elaborate a bit on the way you're handling it? From what I understand, if you're ignoring the duplicate parts, you should be getting the same values as getProgress().
Are you keeping track of the 1-5 part and only incrementing the progress on every new index? If so, how do you handle the parts that are mixed (6-5, 7-5, etc.)?

AndreasGassmann · 2021-06-28T11:30:10Z

We match the whole string we scan, something like this:

// const qrs = new Set()
// const qr: string = 'ur:bytes/1-10/...'

if (!qrs.has(qr)) {
  ur.receive(qr)
  qrs.add(qr)
}

antonispoulakis · 2021-06-28T13:39:24Z

Are you getting the progress by doing something like this?

// from the `ur:bytes/1-10/...` you can get that
const expectedLength = 10
const progress = qrs.size() / expectedLength

This will exactly match the result of getProgress(), at least for the first 10 qr codes.

If you're using something like the snippet above, you can also end up overshooting if the scanning device randomly misses parts and ends up having to scan the mixed parts (11-10...), which can end up needing more than 10 total scans.

The way I've handled that in an app is by doing:

ur.receivePart(data);

// it's complete, parse data, use them, stop scanner
if (ur.isComplete()) {}

// update progress
setProgress(urDecoder.getProgress());

AndreasGassmann · 2021-06-28T15:15:36Z

Sorry if I was unclear, but we use estimatedPercentComplete(). We don't do any progress calculation ourselves. The only issue I wanted to bring up here is the following:

Assuming we have a payload that will result in 2 UR parts:

ur:bytes/1-2/aaaa
ur:bytes/2-2/bbbb
ur:bytes/3-2/cccc
...

If those are displayed 1 second each, but the scanner scans 5 times per second, it will register some of them multiple times.

// QR scanner
ur:bytes/1-2/aaaa (scanned after 0s)
ur:bytes/1-2/aaaa (scanned after 0.2s)
ur:bytes/1-2/aaaa (scanned after 0.4s)
ur:bytes/1-2/aaaa (scanned after 0.6s)
ur:bytes/1-2/aaaa (scanned after 0.8s)
ur:bytes/2-2/bbbb (scanned after 1s)
ur:bytes/2-2/bbbb (scanned after 1.2s)
ur:bytes/2-2/bbbb (scanned after 1.4s)
ur:bytes/2-2/bbbb (scanned after 1.6s)
ur:bytes/2-2/bbbb (scanned after 1.8s)
...

If I call ur.receive() with every string that the scanner returns (which potentially gives us the exact same string multiple times), then estimatedPercentComplete() will go towards 100% very quickly. If I simply keep track of all the strings that we scanned and don't "receive" them a second time (see snippet above), then everything works as expected. So my only suggestion was to do this in the library itself. Basically if the exact string was handled before, then just ignore it in the future.

antonispoulakis · 2021-06-28T15:44:44Z

If I now call ur.receive() with every string that the scanner returns (which potentially give us the exact same string multiple times), then estimatedPercentComplete() will go towards 100% very quickly.

That is the expected result of estimatedPercentComplete. It will return the number of times that receive has been called.

If I simply keep track of all the strings that we scanned (see snippet above), then everything works as expected. So my only suggestion was to do this in the library itself. Basically if this exact string was handled before, then just ignore it.

You can see here that if the fragment/part already exists, it will ignore it, so only unique parts are saved. There's no need to store and compare raw strings.

processSimplePart(): {
    if (this.receivedPartIndexes.includes(fragmentIndex)) {
      return;
    }

    this.simpleParts.push({ key: part.indexes, value: part });
    this.receivedPartIndexes.push(fragmentIndex);
   ...
}

The getProcess function returns these unique received indexes.

So, you can safely call ur.receive on every new scan, not store/compare the raw strings and use getProgress instead of estimatedPercentComplete. Have you tried using getProgress without the string comparisons?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent processing of duplicate parts #4

Prevent processing of duplicate parts #4

AndreasGassmann commented Mar 15, 2021

xardass commented Jun 23, 2021

antonispoulakis commented Jun 26, 2021

AndreasGassmann commented Jun 28, 2021

antonispoulakis commented Jun 28, 2021

AndreasGassmann commented Jun 28, 2021

antonispoulakis commented Jun 28, 2021

AndreasGassmann commented Jun 28, 2021 •

edited

Loading

antonispoulakis commented Jun 28, 2021

Prevent processing of duplicate parts #4

Prevent processing of duplicate parts #4

Comments

AndreasGassmann commented Mar 15, 2021

xardass commented Jun 23, 2021

antonispoulakis commented Jun 26, 2021

AndreasGassmann commented Jun 28, 2021

antonispoulakis commented Jun 28, 2021

AndreasGassmann commented Jun 28, 2021

antonispoulakis commented Jun 28, 2021

AndreasGassmann commented Jun 28, 2021 • edited Loading

antonispoulakis commented Jun 28, 2021

AndreasGassmann commented Jun 28, 2021 •

edited

Loading