Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Raise warning on staging server when we cannot delete variables used in charts #3716

Merged
merged 2 commits into from
Dec 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions etl/grapher_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@
from apps.backport.datasync import data_metadata as dm
from apps.backport.datasync.datasync import upload_gzip_string
from apps.chart_sync.admin_api import AdminAPI
from apps.wizard.app_pages.chart_diff.chart_diff import ChartDiffsLoader
from etl import config
from etl.db import get_engine
from etl.db import get_engine, production_or_master_engine

from . import grapher_helpers as gh
from . import grapher_model as gm
Expand Down Expand Up @@ -496,8 +497,7 @@ def cleanup_ghost_variables(engine: Engine, dataset_id: int, upserted_variable_i
if rows:
rows = pd.DataFrame(rows, columns=["chartId", "variableId"])

# raise an error if on staging server
if config.ENV == "staging":
if _raise_error_for_deleted_variables(rows):
raise ValueError(f"Variables used in charts will not be deleted automatically:\n{rows}")
else:
# otherwise show a warning
Expand Down Expand Up @@ -575,6 +575,19 @@ def cleanup_ghost_variables(engine: Engine, dataset_id: int, upserted_variable_i
return True


def _raise_error_for_deleted_variables(rows: pd.DataFrame) -> bool:
"""If we run into ghost variables that are still used in charts, should we raise an error?"""
# raise an error if on staging server
if config.ENV == "staging":
# It's possible that we merged changes to ETL, but the staging server still uses old charts. In
# that case, we first check that the charts were really modified on our staging server.
modified_charts = ChartDiffsLoader(config.OWID_ENV.get_engine(), production_or_master_engine()).df
return bool(set(modified_charts.index) & set(rows.chartId))
Copy link
Member

@lucasrodes lucasrodes Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried following the logic here, but didn't really come to an understanding. Could you help me understand it?

So, my logic is:

  1. Before we had something like: "raise error if there are ghost variables (i.e. rows)"

  2. Now, you are proposing to evaluate bool(set(modified_charts.index)) in addition. This should return True whenever modified_charts is not empty, right? (In that case wouldn't it be enough to just use not modified_charts.empty?).

  3. Still, if modified_charts is not empty, why is that a potential sign to raise an error? Isn't modified_charts containing a row per each chart that has been modified (regardless of it being approved, etc.)?

I think I'm missing something here. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are proposing to evaluate bool(set(modified_charts.index)) in addition. This should return True whenever modified_charts is not empty, right? (In that case wouldn't it be enough to just use not modified_charts.empty?).

set(modified_charts.index) contains chart ids that have been modified on staging (either via config, data or metadata) and set(rows.chartId) contains problematic chart ids. Previously, we raised an error if set(rows.chartId) wasn't empty, but the problem was that sometimes it was empty because of race conditions between staging & prod.

bool(set(modified_charts.index) & set(rows.chartId)) returns True if there are any problematic charts AND those charts have been also modified on the staging server.

I could have also written it as:

if rows and any(row.chartId in modified_charts.index):
  return True

Perhaps it's me who's missing something and I misunderstood your question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my god. I miss-read one of the parents. I read bool(set(modified_charts.index)) & set(rows.chartId).

NVM, I got it, makes total sense.

# if not on staging, always raise an error
else:
return True


def cleanup_ghost_sources(engine: Engine, dataset_id: int, dataset_upserted_source_ids: List[int]) -> None:
"""Remove all leftover sources that didn't get upserted into DB during grapher step.
This could happen when you rename or delete sources.
Expand Down
Loading