Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lint sources #78

Merged
merged 41 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
769dc07
add sources to manifest.json fixture
otosky Oct 10, 2024
40d3e2e
add minimal test assertion for parsing sources
otosky Oct 10, 2024
1ee0efc
fmt json
otosky Oct 11, 2024
e33d8b8
add Source model and parse into ManifestLoader
otosky Oct 15, 2024
480b41e
add fixtures for sources
otosky Oct 15, 2024
2fa3d18
infer resource_type from Rule.evaluate type annotation
otosky Oct 15, 2024
141aa3c
evaluate rules for both sources and models
otosky Oct 15, 2024
17ef4be
allow sources to be filtered
otosky Oct 17, 2024
a85ad3a
coverage
otosky Oct 17, 2024
1a50d6b
no cover overload
otosky Oct 17, 2024
4666419
replace occurrences of "model" with "evaluable"
otosky Oct 19, 2024
213f065
refactor model_filter as rule_filter
otosky Oct 19, 2024
713bdee
fmt
otosky Oct 19, 2024
aa9371d
update dbt_ls cmd to include sources
otosky Oct 19, 2024
bb2ff9b
add newline to pyproject.toml
otosky Oct 22, 2024
68453dc
fix docstring
otosky Oct 22, 2024
feedc76
update some docstrings for naming changes
otosky Oct 23, 2024
91fe1ec
update pyproject description
otosky Oct 23, 2024
0fff2fc
update manifest filter
otosky Oct 23, 2024
69c3df9
update _reindex_tests for sources
otosky Oct 23, 2024
fe1dd7f
update more model_filter renames in comments
otosky Oct 23, 2024
e35176b
rename config to fail_any_item_under
otosky Oct 24, 2024
beac2e4
update human_readable_formatter
otosky Oct 24, 2024
95da946
fmt
otosky Oct 24, 2024
e095ffc
move check for resource_type match to `should_evaluate` method
otosky Oct 24, 2024
836bb68
update docs
otosky Oct 24, 2024
fea456b
rename test_model_filter -> test_rule_filter
otosky Oct 24, 2024
ea7691f
add newline to pyproject.toml
otosky Oct 28, 2024
1122273
validate that filters match the resource type of the rules they attac…
otosky Oct 29, 2024
eca0b1e
Final renaming of models to include sources
jochemvandooren Oct 31, 2024
cc40c46
fix line lengths
otosky Oct 31, 2024
a5b2ac1
update changelog
otosky Oct 31, 2024
8ca822f
remove hard dep on more-itertools by vendoring first_true
otosky Oct 31, 2024
30900c4
actually commit more_itertools replacement
otosky Oct 31, 2024
6204bc4
add newline
otosky Nov 1, 2024
7ecc1b2
remove breaking notice
otosky Nov 1, 2024
cc6b707
fix manifest_formatter for source scores
otosky Nov 1, 2024
ba16e64
run prettier on changelog
otosky Nov 2, 2024
5ef7cae
fix import
otosky Nov 2, 2024
e19badc
address mypy errors
otosky Nov 12, 2024
0732e15
mypy ignore Liskov violations on class-based rule/filter
otosky Nov 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ and this project adheres to

## [Unreleased]

- Support linting of sources.
- **Breaking**: Renamed modules: `dbt_score.model_filter` becomes
`dbt_score.rule_filter`
- **Breaking**: Renamed filter class and decorator: `@model_filter` becomes
`@rule_filter` and `ModelFilter` becomes `RuleFilter`.
- **Breaking**: Config option `model_filter_names` becomes `rule_filter_names`.
- **Breaking**: CLI flag naming fixes: `--fail_any_model_under` becomes
`--fail-any-item-under` and `--fail_project_under` becomes
`--fail-project-under`.

## [0.7.1] - 2024-11-01

- Fix mkdocs.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

## What is `dbt-score`?

`dbt-score` is a linter for dbt model metadata.
`dbt-score` is a linter for dbt metadata.

[dbt][dbt] (Data Build Tool) is a great framework for creating, building,
organizing, testing and documenting _data models_, i.e. data sets living in a
Expand Down
10 changes: 5 additions & 5 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ rule_namespaces = ["dbt_score.rules", "dbt_score_rules", "custom_rules"]
disabled_rules = ["dbt_score.rules.generic.columns_have_description"]
inject_cwd_in_python_path = true
fail_project_under = 7.5
fail_any_model_under = 8.0
fail_any_item_under = 8.0

[tool.dbt-score.badges]
first.threshold = 10.0
Expand Down Expand Up @@ -51,8 +51,8 @@ The following options can be set in the `pyproject.toml` file:
- `disabled_rules`: A list of rules to disable.
- `fail_project_under` (default: `5.0`): If the project score is below this
value the command will fail with return code 1.
- `fail_any_model_under` (default: `5.0`): If any model scores below this value
the command will fail with return code 1.
- `fail_any_item_under` (default: `5.0`): If any model or source scores below
this value the command will fail with return code 1.

#### Badges configuration

Expand All @@ -70,7 +70,7 @@ All badges except `wip` can be configured with the following option:

- `threshold`: The threshold for the badge. A decimal number between `0.0` and
`10.0` that will be used to compare to the score. The threshold is the minimum
score required for a model to be rewarded with a certain badge.
score required for a model or source to be rewarded with a certain badge.

The default values can be found in the
[BadgeConfig](reference/config.md#dbt_score.config.BadgeConfig).
Expand All @@ -86,7 +86,7 @@ Every rule can be configured with the following option:
- `severity`: The severity of the rule. Rules have a default severity and can be
overridden. It's an integer with a minimum value of 1 and a maximum value
of 4.
- `model_filter_names`: Filters used by the rule. Takes a list of names that can
- `rule_filter_names`: Filters used by the rule. Takes a list of names that can
be found in the same namespace as the rules (see
[Package rules](package_rules.md)).

Expand Down
71 changes: 56 additions & 15 deletions docs/create_rules.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Create rules

In order to lint and score models, `dbt-score` uses a set of rules that are
applied to each model. A rule can pass or fail when it is run. Based on the
severity of the rule, models are scored with the weighted average of the rules
results. Note that `dbt-score` comes bundled with a
In order to lint and score models or sources, `dbt-score` uses a set of rules
that are applied to each item. A rule can pass or fail when it is run. Based on
the severity of the rule, items are scored with the weighted average of the
rules results. Note that `dbt-score` comes bundled with a
[set of default rules](rules/generic.md).

On top of the generic rules, it's possible to add your own rules. Two ways exist
Expand All @@ -21,7 +21,7 @@ The `@rule` decorator can be used to easily create a new rule:
from dbt_score import Model, rule, RuleViolation

@rule
def has_description(model: Model) -> RuleViolation | None:
def model_has_description(model: Model) -> RuleViolation | None:
"""A model should have a description."""
if not model.description:
return RuleViolation(message="Model lacks a description.")
Expand All @@ -31,6 +31,21 @@ The name of the function is the name of the rule and the docstring of the
function is its description. Therefore, it is important to use a
self-explanatory name for the function and document it well.

The type annotation for the rule's argument dictates whether the rule should be
applied to dbt models or sources.

Here is the same example rule, applied to sources:

```python
from dbt_score import rule, RuleViolation, Source

@rule
def source_has_description(source: Source) -> RuleViolation | None:
"""A source should have a description."""
if not source.description:
return RuleViolation(message="Source lacks a description.")
```

The severity of a rule can be set using the `severity` argument:

```python
Expand All @@ -45,15 +60,23 @@ For more advanced use cases, a rule can be created by inheriting from the `Rule`
class:

```python
from dbt_score import Model, Rule, RuleViolation
from dbt_score import Model, Rule, RuleViolation, Source

class HasDescription(Rule):
class ModelHasDescription(Rule):
description = "A model should have a description."

def evaluate(self, model: Model) -> RuleViolation | None:
"""Evaluate the rule."""
if not model.description:
return RuleViolation(message="Model lacks a description.")

class SourceHasDescription(Rule):
description = "A source should have a description."

def evaluate(self, source: Source) -> RuleViolation | None:
"""Evaluate the rule."""
if not source.description:
return RuleViolation(message="Source lacks a description.")
```

### Rules location
Expand Down Expand Up @@ -91,30 +114,48 @@ def sql_has_reasonable_number_of_lines(model: Model, max_lines: int = 200) -> Ru
)
```

### Filtering models
### Filtering rules

Custom and standard rules can be configured to have model filters. Filters allow
models to be ignored by one or multiple rules.
Custom and standard rules can be configured to have filters. Filters allow
models or sources to be ignored by one or multiple rules if the item doesn't
satisfy the filter criteria.

Filters are created using the same discovery mechanism and interface as custom
rules, except they do not accept parameters. Similar to Python's built-in
`filter` function, when the filter evaluation returns `True` the model should be
`filter` function, when the filter evaluation returns `True` the item should be
evaluated, otherwise it should be ignored.

```python
from dbt_score import ModelFilter, model_filter
from dbt_score import Model, RuleFilter, rule_filter

@model_filter
@rule_filter
def only_schema_x(model: Model) -> bool:
"""Only applies a rule to schema X."""
return model.schema.lower() == 'x'

class SkipSchemaY(ModelFilter):
class SkipSchemaY(RuleFilter):
description = "Applies a rule to every schema but Y."
def evaluate(self, model: Model) -> bool:
return model.schema.lower() != 'y'
```

Filters also rely on type-annotations to dictate whether they apply to models or
sources:

```python
from dbt_score import RuleFilter, rule_filter, Source

@rule_filter
def only_from_source_a(source: Source) -> bool:
"""Only applies a rule to source tables from source X."""
return source.source_name.lower() == 'a'

class SkipSourceDatabaseB(RuleFilter):
description = "Applies a rule to every source except Database B."
def evaluate(self, source: Source) -> bool:
return source.database.lower() != 'b'
```

Similar to setting a rule severity, standard rules can have filters set in the
[configuration file](configuration.md/#tooldbt-scorerulesrule_namespacerule_name),
while custom rules accept the configuration file or a decorator parameter.
Expand All @@ -123,7 +164,7 @@ while custom rules accept the configuration file or a decorator parameter.
from dbt_score import Model, rule, RuleViolation
from my_project import only_schema_x

@rule(model_filters={only_schema_x()})
@rule(rule_filters={only_schema_x()})
def models_in_x_follow_naming_standard(model: Model) -> RuleViolation | None:
"""Models in schema X must follow the naming standard."""
if some_regex_fails(model.name):
Expand Down
4 changes: 2 additions & 2 deletions docs/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ It's also possible to automatically run `dbt parse`, to generate the
dbt-score lint --run-dbt-parse
```

To lint only a selection of models, the argument `--select` can be used. It
accepts any
To lint only a selection of models or sources, the argument `--select` can be
used. It accepts any
[dbt node selection syntax](https://docs.getdbt.com/reference/node-selection/syntax):

```shell
Expand Down
21 changes: 11 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

`dbt-score` is a linter for [dbt](https://www.getdbt.com/) metadata.

dbt allows data practitioners to organize their data in to _models_. Those
models have metadata associated with them: documentation, tests, types, etc.
dbt allows data practitioners to organize their data in to _models_ and
_sources_. Those models and sources have metadata associated with them:
documentation, tests, types, etc.

`dbt-score` allows to lint and score this metadata, in order to enforce (or
encourage) good practices.
Expand All @@ -12,7 +13,7 @@ encourage) good practices.

```
> dbt-score lint
🥇 customers (score: 10.0)
🥇 M: customers (score: 10.0)
OK dbt_score.rules.generic.has_description
OK dbt_score.rules.generic.has_owner
OK dbt_score.rules.generic.sql_has_reasonable_number_of_lines
Expand All @@ -25,17 +26,17 @@ score.

## Philosophy

dbt models are often used as metadata containers: either in YAML files or
through the use of `{{ config() }}` blocks, they are associated with a lot of
dbt models/sources are often used as metadata containers: either in YAML files
or through the use of `{{ config() }}` blocks, they are associated with a lot of
information. At scale, it becomes tedious to enforce good practices in large
data teams dealing with many models.
data teams dealing with many models/sources.

To that end, `dbt-score` has 2 main features:

- It runs rules on models, and displays rule violations. Those can be used in
interactive environments or in CI.
- Using those run results, it scores models, as to give them a measure of their
maturity. This score can help gamify model metadata improvements, and be
- It runs rules on dbt models and sources, and displays any rule violations.
These can be used in interactive environments or in CI.
- Using those run results, it scores items, to ascribe them a measure of their
maturity. This score can help gamify metadata improvements/coverage, and be
reflected in data catalogs.

`dbt-score` aims to:
Expand Down
6 changes: 3 additions & 3 deletions docs/programmatic_invocations.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ When `dbt-score` terminates, it exists with one of the following exit codes:
project being linted either doesn't raise any warning, or the warnings are
small enough to be above the thresholds. This generally means "successful
linting".
- `1` in case of linting errors. This is the unhappy case: some models in the
project raise enough warnings to have a score below the defined thresholds.
This generally means "linting doesn't pass".
- `1` in case of linting errors. This is the unhappy case: some models or
sources in the project raise enough warnings to have a score below the defined
thresholds. This generally means "linting doesn't pass".
- `2` in case of an unexpected error. This happens for example if something is
misconfigured (for example a faulty dbt project), or the wrong parameters are
given to the CLI. This generally means "setup needs to be fixed".
7 changes: 6 additions & 1 deletion pyproject.toml
otosky marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "pdm.backend"
name = "dbt-score"
dynamic = ["version"]

description = "Linter for dbt model metadata."
description = "Linter for dbt metadata."
authors = [
{name = "Picnic Analyst Development Platform", email = "[email protected]"}
]
Expand Down Expand Up @@ -101,6 +101,7 @@ max-args = 9
[tool.ruff.lint.per-file-ignores]
"tests/**/*.py" = [
"PLR2004", # Magic value comparisons
"PLR0913", # Too many args in func def
]

### Coverage ###
Expand All @@ -114,3 +115,7 @@ source = [
[tool.coverage.report]
show_missing = true
fail_under = 80
exclude_also = [
"@overload"
]

9 changes: 5 additions & 4 deletions src/dbt_score/__init__.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
"""Init dbt_score package."""

from dbt_score.model_filter import ModelFilter, model_filter
from dbt_score.models import Model
from dbt_score.models import Model, Source
from dbt_score.rule import Rule, RuleViolation, Severity, rule
from dbt_score.rule_filter import RuleFilter, rule_filter

__all__ = [
"Model",
"ModelFilter",
"Source",
"RuleFilter",
"Rule",
"RuleViolation",
"Severity",
"model_filter",
"rule_filter",
"rule",
]
16 changes: 8 additions & 8 deletions src/dbt_score/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,15 +81,15 @@ def cli() -> None:
default=False,
)
@click.option(
"--fail_project_under",
"--fail-project-under",
help="Fail if the project score is under this value.",
type=float,
is_flag=False,
default=None,
)
@click.option(
"--fail_any_model_under",
help="Fail if any model is under this value.",
"--fail-any-item-under",
help="Fail if any evaluable item is under this value.",
type=float,
is_flag=False,
default=None,
Expand All @@ -104,9 +104,9 @@ def lint(
manifest: Path,
run_dbt_parse: bool,
fail_project_under: float,
fail_any_model_under: float,
fail_any_item_under: float,
) -> None:
"""Lint dbt models metadata."""
"""Lint dbt metadata."""
manifest_provided = (
click.get_current_context().get_parameter_source("manifest")
!= ParameterSource.DEFAULT
Expand All @@ -122,8 +122,8 @@ def lint(
config.overload({"disabled_rules": disabled_rule})
if fail_project_under:
config.overload({"fail_project_under": fail_project_under})
if fail_any_model_under:
config.overload({"fail_any_model_under": fail_any_model_under})
if fail_any_item_under:
config.overload({"fail_any_item_under": fail_any_item_under})

try:
if run_dbt_parse:
Expand All @@ -148,7 +148,7 @@ def lint(
ctx.exit(2)

if (
any(x.value < config.fail_any_model_under for x in evaluation.scores.values())
any(x.value < config.fail_any_item_under for x in evaluation.scores.values())
or evaluation.project_score.value < config.fail_project_under
):
ctx.exit(1)
Expand Down
4 changes: 2 additions & 2 deletions src/dbt_score/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class Config:
"disabled_rules",
"inject_cwd_in_python_path",
"fail_project_under",
"fail_any_model_under",
"fail_any_item_under",
]
_rules_section: Final[str] = "rules"
_badges_section: Final[str] = "badges"
Expand All @@ -70,7 +70,7 @@ def __init__(self) -> None:
self.config_file: Path | None = None
self.badge_config: BadgeConfig = BadgeConfig()
self.fail_project_under: float = 5.0
self.fail_any_model_under: float = 5.0
self.fail_any_item_under: float = 5.0

def set_option(self, option: str, value: Any) -> None:
"""Set an option in the config."""
Expand Down
2 changes: 1 addition & 1 deletion src/dbt_score/dbt_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def dbt_parse() -> "dbtRunnerResult":
@dbt_required
def dbt_ls(select: Iterable[str] | None) -> Iterable[str]:
"""Run dbt ls."""
cmd = ["ls", "--resource-type", "model", "--output", "name"]
cmd = ["ls", "--resource-types", "model", "source", "--output", "name"]
if select:
cmd += ["--select", *select]

Expand Down
Loading