Begin warning people about spaces in model names #9886

QMalcolm · 2024-04-10T06:48:42Z

resolves #9397

Problem

We don't support models with spaces in their names. However, we haven't been actually enforcing this. If a person have spaces in their model names, it causes issues when using selectors. Additionally, depending on a person's operating system, there were other edge case problems with spaces in model names.

Solution

Begin warning people about spaces in their model names. Depending on some flags, we give more or less information. If debug mode is not on, then we log out the first offending model name and a count of how many offending model names their are in total. If debug mode is on, then we log out every offending model name. Additionally, by default currently these logs are warnings and won't stop dbt from running. However, if one sets allow_spaces_in_model_names to False in their dbt_project.yml, then the logs become errors and found offending model names will stop dbt from running.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
This PR includes type annotations for new and modified functions

codecov · 2024-04-10T06:53:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.13%. Comparing base (95581cc) to head (9cdecaa).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9886      +/-   ##
==========================================
- Coverage   88.14%   88.13%   -0.02%     
==========================================
  Files         178      178              
  Lines       22459    22507      +48     
==========================================
+ Hits        19797    19837      +40     
- Misses       2662     2670       +8

Flag	Coverage Δ
integration	`85.57% <100.00%> (-0.02%)`	⬇️
unit	`61.82% <73.17%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

core/dbt/events/types.py

core/dbt/parser/manifest.py

tests/functional/manifest_validations/test_check_for_spaces_in_model_names.py

.changes/unreleased/Features-20240409-233347.yaml

QMalcolm · 2024-04-10T19:35:51Z

Interesting the failing unit tests are also failing locally, but are failing differently 🤔 In the github action they are:

FAILED tests/unit/test_graph.py::GraphTest::test__dependency_list - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__model_incremental - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'.
FAILED tests/unit/test_graph.py::GraphTest::test__model_materializations - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__partial_parse - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__single_model - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_package_ref - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_simple_ref - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'

But locally the errors are:

FAILED tests/unit/test_graph.py::GraphTest::test__dependency_list - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__model_incremental - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__model_materializations - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__partial_parse - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__single_model - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_package_ref - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_simple_ref - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'

They're failing on different flag attributes being accessed, and in different parts of the codebase... weird

QMalcolm · 2024-04-10T19:37:23Z

I just checked out main and am still running into the AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS' error in those tests. So something with my local environment is screwy and not getting to the error we're seeing in the GHA. Gonna nuke my tox cache locally and see if that fixes things.

QMalcolm · 2024-04-10T19:45:43Z

After busting my local tox cache, running the tests via make test they all pass. However if I use pytest to directly test the failing graph tests (pytest tests/unit/test_graph.py) they fail with the AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'.

QMalcolm · 2024-04-11T00:23:30Z

Rebased off of main and tracked down the issue 🙂

QMalcolm · 2024-04-11T22:34:11Z

We had a fix on main for the failing integration tests in test_pp_vars.py. Rebasing to bring in those changes.

For projects with a lot of models that have spaces in their names, the warning about this deprecation would be incredibly annoying. Now we instead only log the first model name issue and then a count of how many models have the issue, unless `--debug` is specified.

We want to be able to catch more than just `SpacesInModelNameDeprecation` events, and in the next commit we will alter our tests to do so. Thus instead of writing a new catcher for each event type, a slight modification to the existing `EventCatcher` makes this much easier.

… events

…`False`

Co-authored-by: Emily Rockman <[email protected]>

…config Previously in our `test_graph.py` unit tests we were setting the flags global, but not actually adding them to the config. The config, without the flags, would then get passed to the manifest loader and other things. Those down stream classes/functions would then look to the config for the flags and not find them. This change ensures that the flags get applied to the config that is being used in down stream operations during our unit tests.

Using `Note` events was causing test flakiness when run in a multi worker environment using `pytest -nauto`. This is because the event manager is currently a global. So in a situation where test `A` starts and test `tests_debug_when_spaces_in_name` starts shortly there after, the event manager for both tests will have the callbacks set in `tests_debug_when_spaces_in_name`. Then if something in test `A` fired a `Note` event, this would affect the count of `Note` events that `tests_debug_when_spaces_in_name` sees, causing assertion failures. By creating a custom event, `TotalModelNamesWithSpacesDeprecation`, we limit the possible flakiness to only tests that fire the custom event. Thus we didn't _eliminate_ all possibility of flakiness, but realistically only the tests in `test_check_for_spaces_in_model_names.py` can now interfere with each other. Which still isn't great, but to fully resolve the problem we need to work on how the event manager is handled (preferably not globally).

Previously we only logged out the count of how many invalid model names there were if there was two or more invalid names (and not in debug mode). However this message is important if there is even one invalid model name and regardless of whether you are running debug mode. That is because automated tools might be looking for the event type to track if anything is wrong. A related change in this commit is that we now only output the debug hint if it wasn't run with debug mode. The idea being that if they are already running it in debug mode, the hint could come accross as somewhat patronizing.

We want people running dbt to be able to at a glance see warnings/errors with running their project. In this case we are focused specifically on errors/warnings in regards to model names containing spaces. Previously we were only ever emitting the `warning_tag` in the message even if the event itself was being emitted at an `ERROR` level. We now properly have `[ERROR]` or `[WARNING]` in the message depending on the level. Unfortunately we couldn't just look what level the event was being fired at, because that information doesn't exist on the event itself. Additionally, we're using events that base off of `DynamicEvents` which unfortunately hard coded to `DEBUG`. Changing this would involve still having a `level` property on the definition in `core_types.proto` and then having `DynamicEvent`s look to `self.level` in the `level_tag` method. Then we could change how firing events works based on the an event's `level_tag` return value. This all sounds like a bit of tech debt suited for PR, possibly multiple, and thus is not being done here.

emmyoop · 2024-04-12T12:48:29Z

core/dbt/events/types.py

+# TODO Move this to dbt_common.ui
+def _error_tag(msg: str) -> str:
+    return f'[{red("ERROR")}]: {msg}'


Are you going to do this TODO before merging?

I do not. That would entail opening a a PR in dbt-common and also getting it released, and I didn't want that to block this PR

Follow up tasks assigned to me:

Drop custom _error_tag and use error_tag provided by dbt-common #9914

Add error_tag helper to ui.py dbt-common#107

emmyoop · 2024-04-12T12:54:51Z

tests/functional/manifest_validations/test_check_for_spaces_in_model_names.py

+
+
+@dataclass
+class EventCatcher:


core/dbt/events/types.py

…ar and plural

… main

emmyoop

Looks good!

QMalcolm requested a review from a team as a code owner April 10, 2024 06:48

cla-bot bot added the cla:yes label Apr 10, 2024

emmyoop requested changes Apr 10, 2024

View reviewed changes

QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from 41e5b5d to 7b60ab0 Compare April 10, 2024 19:26

QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from 678462d to 958d5ba Compare April 10, 2024 23:53

QMalcolm requested a review from emmyoop April 11, 2024 00:23

QMalcolm and others added 13 commits April 11, 2024 15:34

Add event type for deprecation of spaces in model names

0ac4b3f

Begin emitting deprecation warning for spaces in model names

a20ef5a

Update tests_debug_when_spaces_in_name to check for relevant Note…

fb2547a

… events

Add project flag to control whether spaces are allowed in model names

3665fcc

Log errors and raise exception when allow_spaces_in_model_names is …

41ef3f7

…`False`

Add changie log for new manifest validation of spaces in model names

cd62234

Fix capitalization in Note event message about improper model names

117b6d4

Co-authored-by: Emily Rockman <[email protected]>

Update test to check Note event contents for total bad model names

0352bf5

Alter SpacesInModelNameDeprecation to inherit from DynamicLevel

f51d65a

Fixup changelog to be a fix not a feature

92fc679

QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch 4 times, most recently from 18fb56c to 9865f4a Compare April 12, 2024 04:11

QMalcolm added 3 commits April 11, 2024 21:29

Reduce duplicate if logic in check_for_spaces_in_model_names

c509d75

QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from 9865f4a to c678cbc Compare April 12, 2024 04:31

emmyoop requested changes Apr 12, 2024

View reviewed changes

QMalcolm mentioned this pull request Apr 12, 2024

Add error_tag helper to ui.py dbt-labs/dbt-common#107

Closed

1 task

Alter TotalModelNamesWithSpacesDeprecation message to handle singul…

a2d609f

…ar and plural

QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from c9f83b9 to a2d609f Compare April 12, 2024 17:33

This was referenced Apr 12, 2024

Drop custom _error_tag and use error_tag provided by dbt-common #9914

Closed

Add error_tag util to the ui module dbt-labs/dbt-common#108

Merged

QMalcolm added 2 commits April 12, 2024 11:06

Merge branch 'main' into qmalcolm--9397-deprecate-spaces-in-model-names

515b61b

Remove duplicate import in test_graph.py introduced from merging in…

9cdecaa

… main

QMalcolm requested a review from emmyoop April 12, 2024 18:36

QMalcolm mentioned this pull request Apr 12, 2024

fully remove support - we should not support spaces in model names #9834

Closed

1 task

emmyoop approved these changes Apr 12, 2024

View reviewed changes

QMalcolm merged commit f15e128 into main Apr 12, 2024
62 checks passed

QMalcolm deleted the qmalcolm--9397-deprecate-spaces-in-model-names branch April 12, 2024 20:25

QMalcolm mentioned this pull request Apr 12, 2024

Migrate from custom _error_tag to dbt-common defined error_tag #9927

Merged

5 tasks

This was referenced Apr 24, 2024

[Core] Upgrade guide for model names that contain spaces dbt-labs/docs.getdbt.com#5347

Closed

[Core] Spaces in dbt model names for v1.8 upgrade guide dbt-labs/docs.getdbt.com#5352

Open

jtcohen6 mentioned this pull request Apr 29, 2024

[Flags] Rename behavior change flags + add consistent deprecation warnings #10062

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin warning people about spaces in model names #9886

Begin warning people about spaces in model names #9886

QMalcolm commented Apr 10, 2024

codecov bot commented Apr 10, 2024 •

edited

Loading

QMalcolm commented Apr 10, 2024

QMalcolm commented Apr 10, 2024

QMalcolm commented Apr 10, 2024

QMalcolm commented Apr 11, 2024

QMalcolm commented Apr 11, 2024

emmyoop Apr 12, 2024

QMalcolm Apr 12, 2024

QMalcolm Apr 12, 2024

emmyoop Apr 12, 2024

emmyoop left a comment



		@dataclass
		class EventCatcher:

Begin warning people about spaces in model names #9886

Begin warning people about spaces in model names #9886

Conversation

QMalcolm commented Apr 10, 2024

Problem

Solution

Checklist

codecov bot commented Apr 10, 2024 • edited Loading

Codecov Report

QMalcolm commented Apr 10, 2024

QMalcolm commented Apr 10, 2024

QMalcolm commented Apr 10, 2024

QMalcolm commented Apr 11, 2024

QMalcolm commented Apr 11, 2024

emmyoop Apr 12, 2024

Choose a reason for hiding this comment

QMalcolm Apr 12, 2024

Choose a reason for hiding this comment

QMalcolm Apr 12, 2024

Choose a reason for hiding this comment

emmyoop Apr 12, 2024

Choose a reason for hiding this comment

emmyoop left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 10, 2024 •

edited

Loading