Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sync status logging #3529

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Jan 16, 2025

See also #2023

As is, the logging of sync status is indexed primarily by a sync "run" that touches all projects, secondarily by project, and tertiarily by repository. The status for each project is also gathered implicitly into three states: in progress, skipped, and synced. This no longer matches how sync is actually run, and is hard to handle in practice.

Here, sync status logging is collapsed into a single table of events, which is written to only from sync_project_task(), with the following states:

  • in_progress
  • done - sync complete, with some changes to DB or repo
  • no_changes - sync complete, with no changed to DB or repo
  • no_commit - commit=False was set, preventing a repo write
  • prev_busy - skipped due to previous sync still running
  • fail - an exception was raised

For fail, an error message is retained in the error field.

The logs are made available via the following views:

  • /sync/log - latest sync status and start time for each project
  • /sync/log/<project> - sync events for each project
  • /sync/errors - all anomalous sync events with a status other than done or no_changes

Old sync events are ported to the new format, setting the status as fail for all that were "skipped".


Some questions for potential futher changes:

  • Are there additional sync states that should be explicitly identified?
  • Is there additional information that should be retained in the sync log?
  • Are there other views into the data that should be presented?
  • The per-project and anomaly listings show the error message, if one is available. Should access to that be limited to some subset of users?

@eemeli eemeli requested review from mathjazz and flodolo January 16, 2025 14:18
@mathjazz
Copy link
Collaborator

Deployed to stage:
https://mozilla-pontoon-staging.herokuapp.com/sync/

Migration 0004 is slow (took about 10 minutes).

no_commit - commit=False was set, preventing a repo write

Is this one always no_commit, even if running, fails, done with changes or not? If yes, I'd probably track this information separate from the sync job state.

/sync/errors - all anomalous sync events with a status other than done or no_changes

Even in_progress?

Are there additional sync states that should be explicitly identified?

Not state per se, but since we treat --no-commit as state here, we should also track --force. Not necessarily as state, though.

Is there additional information that should be retained in the sync log?

These would be great (taken from #2023), can be done in a follow-up:

  • What entities/locales were synced
  • For each entity, what action did the handle_entity end up taking?
  • What resources were added or removed?

Are there other views into the data that should be presented?

Need to have a closer look...

The per-project and anomaly listings show the error message, if one is available. Should access to that be limited to some subset of users?

Probably. To admins. Maybe all /sync pages?

@mathjazz
Copy link
Collaborator

/sync/errors - all anomalous sync events with a status other than done or no_changes

Could this be /sync/log/errors?

Old sync events are ported to the new format, setting the status as fail for all that were "skipped".

Why not no_changes?

@mathjazz
Copy link
Collaborator

mathjazz commented Jan 16, 2025

Early observations:

  1. Thanks for working on this. This PR makes things much simpler, but still adds features. Nice work!
  2. The project sync view (e.g.) would be nice to be linked to from the project dashboard. Could be a new tab?
  3. If you add Last duration to https://mozilla-pontoon-staging.herokuapp.com/sync/log/, I don't think I'll miss https://pontoon.mozilla.org/sync/log/ anymore.

@eemeli
Copy link
Member Author

eemeli commented Jan 16, 2025

Migration 0004 is slow (took about 10 minutes).

It was maybe a minute for me locally, and I didn't spend much time optimizing it -- the data needs to be read from the project & repo sync logs & be rewritten in the new format.

Easiest way to speed it up would be to e.g. only port the last week or so of events?

no_commit - commit=False was set, preventing a repo write

Is this one always no_commit, even if running, fails, done with changes or not? If yes, I'd probably track this information separate from the sync job state.

It should only be written if the status would be done otherwise. Probably ought to limit it further to only getting picked if there would be any repo changes.

/sync/errors - all anomalous sync events with a status other than done or no_changes

Even in_progress?

Yes. That's anomalous if e.g. there's a crash mid-sync and we don't get to write a fail.

Are there additional sync states that should be explicitly identified?

Not state per se, but since we treat --no-commit as state here, we should also track --force. Not necessarily as state, though.

Yeah, that and the sync source (automated vs. manual CLI vs. admin UI) should be tracked.

Is there additional information that should be retained in the sync log?

These would be great (taken from #2023), can be done in a follow-up:

  • What entities/locales were synced
  • For each entity, what action did the handle_entity end up taking?
  • What resources were added or removed?

Yeah, that should be a follow-up.

The per-project and anomaly listings show the error message, if one is available. Should access to that be limited to some subset of users?

Probably. To admins. Maybe all /sync pages?

The overview should be pretty safe to keep public, yes? It doesn't include error logs.

/sync/errors - all anomalous sync events with a status other than done or no_changes

Could this be /sync/log/errors?

That would conflict with a project using errors as its slug, though.

Old sync events are ported to the new format, setting the status as fail for all that were "skipped".

Why not no_changes?

Because we can't tell the difference, both are logged the same way. But I should probably set it to some other value instead.

The project sync view (e.g.) would be nice to be linked to from the project dashboard. Could be a new tab?

Potentially, yes.

If you add Last duration to https://mozilla-pontoon-staging.herokuapp.com/sync/log/, I don't think I'll miss https://pontoon.mozilla.org/sync/log/ anymore.

I shall take that under advisement.

@eemeli
Copy link
Member Author

eemeli commented Jan 16, 2025

Made the following changes:

  • Limited no_commit to only when repo changes would otherwise have been made.
  • Require can_manage_project permission for project & error logs
  • Migrate previous "skipped" syncs to status=99, which gets rendered as ??? (99)
  • Added a "Duration" column to the top-level listing
  • By default, hide "no changes" syncs in top-level listing, with toggle for showing them

@mathjazz
Copy link
Collaborator

Limited no_commit to only when repo changes would otherwise have been made.

I wouldn't mix sync type (regular, no commit, no pull, force) with status.

Migrate previous "skipped" syncs to status=99, which gets rendered as ??? (99)

I don't think it's obvious what this means. I suggested no_changes, because it would be accurate for a vast majority of cases.

By default, hide "no changes" syncs in top-level listing, with toggle for showing them

I'm not sure I like that to be the default, because I regularly check https://pontoon.mozilla.org/sync/log/ before deployment to see if all sync jobs are done. 😊 But maybe we can have an overall "sync in-progress" indicator in the header?

Should we also remove the DB projects from the list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants