Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Archiver working on Segments (day/week/month/range/year) on every run, even with parameter --skip-segments-today #22868

Open
4 tasks done
peterbo opened this issue Dec 16, 2024 · 4 comments
Labels
Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. triaged Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users.

Comments

@peterbo
Copy link
Contributor

peterbo commented Dec 16, 2024

What happened?

Two examples:

INFO [2024-12-16 17:02:50] 45194 Archived website id 1, period = month, date = 2024-12-01, segment = 'referrerType==campaign', 562709 visits found. Time elapsed: 11.308s
INFO [2024-12-16 17:06:42] 45194 Archived website id 1, period = year, date = 2024-01-01, segment = 'dimension1==logged-in;dimension2==consent_given', 1559273 visits found. Time elapsed: 34.031s

Next run:
INFO [2024-12-16 17:13:08] 74184 Archived website id 1, period = month, date = 2024-12-01, segment = 'referrerType==campaign', 562709 visits found. Time elapsed: 5.748s
INFO [2024-12-16 17:13:42] 74184 Archived website id 1, period = year, date = 2024-01-01, segment = 'dimension1==logged-in;dimension2==consent_given', 1559273 visits found. Time elapsed: 21.569s

Archiver is called like this:
./console core:archive --no-ansi --skip-segments-today

Additonally, the archiver seems to be ignoring time_before_today_archive_considered_outdated which is set to 3000 (archiving runs above are only 10 minutes apart).

What should happen?

Only work on Segments once, when archiver contains parameter "--skip-segments-today", not on every run (the exception is when a previous period was invalidated by new data, which is not the case here).

How can this be reproduced?

In a Matomo instance that contains Segments, call the archiver and view the processing information.

Matomo version

5.1.2

PHP version

8.3

Server operating system

Debian

What browsers are you seeing the problem on?

Not applicable (e.g. an API call etc.)

Computer operating system

No response

Relevant log output

No response

Validations

@peterbo peterbo added Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member labels Dec 16, 2024
@peterbo peterbo changed the title [Bug] Archiver working on Segments (day/week/month/range/year) on every run, even if archiver started with --skip-segments-today [Bug] Archiver working on Segments (day/week/month/range/year) on every run, even with parameter --skip-segments-today Dec 16, 2024
@mneudert
Copy link
Member

Hi @peterbo,

thank you for raising this issue. We also became aware of it some time ago, and included a fix in 5.2.0 that should address most of the problems you see: #22546

There is still a gap though: If archiving is not running on the first day of a period, it will still be running too often. For example the week period for a segment will be skipped on a Monday, but is getting archived multiple times on a Tuesday, even with "--skip-segments-today" implying it should only take the Monday data into account and run exactly once on Tuesday. This behaviour is already planned to be fixed.

Can you give 5.2.0 a try and check if --skip-segments-today is behaving better? And inform us which reports are still being archived too often (e.g. ignoring the "outdated" configuration) for you so we can include them in the upcoming fix?

@peterbo
Copy link
Contributor Author

peterbo commented Dec 17, 2024

Thank you, @mneudert. I don't have updated too many instances yet to 5.2.0, but I'll give it a shot!

@randy-innocraft randy-innocraft added triaged and removed To Triage An issue awaiting triage by a Matomo core team member labels Dec 17, 2024
@ronak-innocraft ronak-innocraft added the Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. label Dec 18, 2024
@peterbo
Copy link
Contributor Author

peterbo commented Dec 18, 2024

Hi @mneudert

I can confirm, that this still exists with 5.2.0 (as you already mentioned):

Even though archiving is running once an hour, it invalidates archives for today and yesterday:
INFO [2024-12-18 15:58:49] 569943 Will invalidate archived reports for today in site ID = 1's timezone (2024-12-18 00:00:00).
INFO [2024-12-18 15:58:50] 569943 Will invalidate archived reports for yesterday in site ID = 1's timezone (2024-12-17 00:00:00).

It works on the previous 30 days range 18.11 until yesterday (no new data invalidated this archive):
INFO [2024-12-18 15:59:49] 569943 Archived website id 1, period = range, date = 2024-11-18,2024-12-17, segment = '', 3438813 visits found. Time elapsed: 17.585s
INFO [2024-12-18 16:04:30] 569943 Archived website id 1, period = range, date = 2024-11-18,2024-12-17, segment = 'visitDuration<=14', 1119057 visits found. Time elapsed: 12.757s

And also on week/month/year/range archives (with and without segments):
INFO [2024-12-18 15:59:58] 569943 Archived website id 1, period = week, date = 2024-12-16, segment = '', 303756 visits found. Time elapsed: 8.096s
INFO [2024-12-18 16:00:03] 569943 Archived website id 1, period = week, date = 2024-12-16, segment = 'dimension1==logged-in;dimension2==consent_given', 11723 visits found. Time elapsed: 4.696s

Also, it is ignoring the time_before_today_archive_considered_outdated setting, which is set to 3000:
Archiving was last executed without error 57 min 24s ago (I have to test this again, since 3000 are only 50 Minutes)

The instance was updated to 5.2.0 yesterday around noon, so the new invalidation behaviour from #22546 is already live.

@mneudert
Copy link
Member

Thank you for giving the new release a try @peterbo.

Even though archiving is running once an hour, it invalidates archives for today and yesterday:

That message will, under most circumstances, always be displayed, even if nothing is actually invalidated in the end. If you run archiving with verbose logging (core:archive -v), you should see more details:

DEBUG     [2024-12-19 15:42:53] 6800  Checking for queued invalidations...
INFO      [2024-12-19 15:42:53] 6800    Will invalidate archived reports for today in site ID = 1's timezone (2024-12-19 00:00:00).
DEBUG     [2024-12-19 15:42:53] 6800    Found usable archive for [idSite = 1, period = day 2024-12-19,2024-12-19, segment = , plugin = , report = ], skipping invalidation.
DEBUG     [2024-12-19 15:42:53] 6800    Found usable archive for [idSite = 1, period = day 2024-12-19,2024-12-19, segment = visitDuration<30, plugin = , report = ], skipping invalidation.
INFO      [2024-12-19 15:42:53] 6800    Will invalidate archived reports for yesterday in site ID = 1's timezone (2024-12-18 00:00:00).
DEBUG     [2024-12-19 15:42:53] 6800    Found usable archive for [idSite = 1, period = day 2024-12-18,2024-12-18, segment = , plugin = , report = ], skipping invalidation.
DEBUG     [2024-12-19 15:42:53] 6800    Found usable archive for [idSite = 1, period = day 2024-12-18,2024-12-18, segment = visitDuration<30, plugin = , report = ], skipping invalidation.
DEBUG     [2024-12-19 15:42:53] 6800  Done invalidating

In this case it skipped both the "All Visits" (segment = '') and the custom segment, resulting in no invalidations (and no archiving) even though the output said "will invalidate".

It works on the previous 30 days range 18.11 until yesterday (no new data invalidated this archive):

If you have "previous30" configured to be archived, either by configuring it in archiving_custom_ranges, or if any of your users has selected it as the default report date, this will indeed be archived more often that expected at the moment.

The range will currently always be subject to time_before_today_archive_considered_outdated (or time_before_range_archive_considered_outdated if set) and re-archived after that.

I expect this will be declared as a bug, and planned for a fix.

And also on week/month/year/range archives (with and without segments):

As you have noticed, with time_before_today_archive_considered_outdated = 3000, this should re-run archives containing today (e.g. today, current week, current month) once around 50 minutes have passed. If your archiving output was stating 57 minutes ago, everything should be in order for the "All Visits" segment.

For your custom dimension it should match the described and planned-to-be-fixed bug around periods not starting today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. triaged Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users.
Projects
None yet
Development

No branches or pull requests

4 participants