-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ID Consistency in Fixtures / Snapshots. #1025
Improve ID Consistency in Fixtures / Snapshots. #1025
Conversation
4cf39db
to
0ec9b86
Compare
7dad28b
to
d9613a0
Compare
d9613a0
to
118b647
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- I can't load the snapshots except the fop few. Those look fine, so I'm trusting your assertion about the remainder being limited to number changes is correct.
- I do not see how this solves the problem you were having, in part because I'm not clear on where you were encountering inconsistencies.
- We might need to do even more here because xdist doesn't do test-suite-execution-level fixtures
All that said, the general cleanup in here is worth a lot - the fixtures are cleaner and better organized, so we should merge this and see if we still need to do more later.
data_sets: OrderedDict[str, SemanticModelDataSet] | ||
) -> OrderedDict[str, ReadSqlSourceNode]: | ||
"""Return a mapping from the name of the semantic model to the dataflow plan node that reads from it.""" | ||
# Moved from model_fixtures.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the note, please do clean these up before merging.
return data_sets | ||
|
||
|
||
@pytest.fixture(scope="session") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To resolve this, we create a single session-level fixture that creates all affected objects in a consistent order.
Note: the "session" scope marker works on a single-instance local run. The problem we might run into here is xdist spawns multiple sessions, each with its own fixture loadout, so there's no guarantee that there will be exactly 1 of these fixtures. Relevant issues: pytest-dev/pytest-xdist#783 pytest-dev/pytest-xdist#271
Now, this might not make a difference in practice - xdist has some concrete splitting logic for tests, so all tests within a module are likely to be grouped into the same process, which means the ID generators are initialized consistently for those blocks of tests. If that logic changes, or if the number of process instances changes such that tests are split into more (or less) granular blocks, the IDs might change as well.
The thing I'm not certain of is if any of this really matters to us - due to the changes in #1015 every time we create a query execution plan we re-set the IDs regardless, so I just don't know if there's any practical impact on IDs here. I assume there is, otherwise you wouldn't bother with this, but if there was you should know that this might not be a long-term robust solution.
Also, since this is apparently not part of the stack with #1015 (or maybe that one was moved on top of this somewhere?) it's kind of hard for me to reason about how those two things will work together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied as a top-level comment.
Also, since this is apparently not part of the stack with #1015 (or maybe that one was moved on top of this somewhere?) it's kind of hard for me to reason about how those two things will work together.
Should have outlined this better, but yes, this PR is part of the same stack and is under #1015.
semantic_manifest_lookup=cyclic_join_semantic_manifest_lookup, | ||
time_spine_source_node=consistent_id_object_repository.cyclic_join_time_spine_source_node, | ||
) | ||
return mf_engine_test_fixture_mapping[SemanticManifestName.CYCLIC_JOIN_MANIFEST].dataflow_plan_builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is much nicer, thank you.
@@ -3,7 +3,8 @@ | |||
import logging | |||
from dataclasses import dataclass | |||
from enum import Enum | |||
from typing import Dict, Mapping, OrderedDict, Sequence, Tuple | |||
from typing import Dict, Mapping, OrderedDict, Sequence | |||
from typing import Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, but whatever....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using the IDE to add imports, so maybe a bug there? I'll update.
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
@dataclass(frozen=True) | ||
class ConsistentIdObjectRepository: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huzzah!
@@ -19,7 +19,7 @@ class IdNumberSpace: | |||
CONSISTENT_ID_REPOSITORY = 10000 | |||
|
|||
|
|||
@pytest.fixture(autouse=True) | |||
@pytest.fixture(autouse=True, scope="function") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, the function-scope here may be what imposes id number consistency, but this shouldn't be a change in behavior since scope defaults to function so I'm not really sure what you were running into or what this fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was left over from some testing iteration - I'll remove.
@@ -45,19 +46,21 @@ def query_parser( # noqa: D | |||
def extended_date_dataflow_plan_builder( # noqa: D | |||
mf_engine_test_fixture_mapping: Mapping[SemanticManifestName, MetricFlowEngineTestFixture] | |||
) -> DataflowPlanBuilder: | |||
# Scope needs to be function as the DataflowPlanBuilder contains state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we explicitly tag these with "function" then?
start_value: int | ||
|
||
@staticmethod | ||
def for_test_start() -> IdNumberSpace: | ||
"""Before each test run.""" | ||
return IdNumberSpace(0) | ||
|
||
@staticmethod | ||
def for_block(block_index: int, block_size: int = 2000, first_block_offset: int = 10000) -> IdNumberSpace: | ||
"""Used to define fixed-size blocks of ID number spaces. | ||
|
||
This is useful for creating unique / non-overlapping IDs for different groups of objects. | ||
""" | ||
if not block_index >= 0: | ||
raise RuntimeError(f"block_index should be >= 0. Got: {block_index}") | ||
if not block_size > 1: | ||
raise RuntimeError(f"block_size should be > 1. Got: {block_size}") | ||
if not first_block_offset >= 0: | ||
raise RuntimeError(f"first_block_offset should be >= 0. Got: {first_block_offset}") | ||
return IdNumberSpace(first_block_offset + block_size * block_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, can we please avoid mixing large mechanical changes like renames with structural or functional changes (or even other renames)?
It's hard to convey just how much harder it is to review this commit as a result, and in other contexts this kind of thing leads to preventable incidents because reviewers get confused about the intersection of these changes and fail to recognize when some small thing is out of place in the mass of mechanical updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've been meaning to avoid such cases as I do get that it makes for a more challenging review.
AMBIGUOUS_RESOLUTION_MANIFEST = SemanticManifestSetupPropertySet( | ||
semantic_manifest_name="ambiguous_resolution_manifest", | ||
id_number_space=IdNumberSpace.for_block(0), | ||
) | ||
# Not including CONFIG_LINTER_MANIFEST as it has intentional errors for running validations. | ||
CYCLIC_JOIN_MANIFEST = SemanticManifestSetupPropertySet( | ||
semantic_manifest_name="cyclic_join_manifest", id_number_space=IdNumberSpace.for_block(1) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this refines the consistency further, but I'm still not sure if this solves the consistency problem altogether.
The issue I saw was similar to the following:
There shouldn't need to be anything special done for |
There was an earlier rename from model -> semantic manifest.
…pace segmentation.
118b647
to
71326ca
Compare
Description
I found some cases where there were run-to-run ID inconsistencies in some fixture objects. Previously, the test fixtures included a
ConsistentIdObjectRepository
that handled this, but there have since been a few additions. This PR adds / migrates existing fixtures to usemf_engine_test_fixture_mapping
(may be renamed) that is a more comprehensive version ofConsistentIdObjectRepository
. This will make snapshots generated using those fixtures (added in later PRs) consistent.To recap: