Skip to content

Commit

Permalink
Merge pull request #94 from bqbooster/fix-docs
Browse files Browse the repository at this point in the history
Fix documentation and especially configuration one
  • Loading branch information
Kayrnt authored Jan 1, 2025
2 parents eee30a2 + dd9e3c8 commit 89f14d4
Show file tree
Hide file tree
Showing 6 changed files with 134 additions and 122 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 5
sidebar_position: 4.1
slug: /audit-logs-vs-information-schema
---

Expand Down
29 changes: 29 additions & 0 deletions docs/configuration/audit-logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
sidebar_position: 4.2
slug: /configuration/audit-logs
---

# GCP BigQuery audit logs

In this mode, the package will monitor all the jobs that written to a GCP BigQuery Audit logs table instead of using `INFORMATION_SCHEMA.JOBS` one.

:::tip

To get the best out of this mode, you should enable the `should_combine_audit_logs_and_information_schema` setting to combine both sources.
More details on [the related page](/audit-logs-vs-information-schema).

:::

To enable the "cloud audit logs mode", you'll need to define explicitly mandatory settings to set in the `dbt_project.yml` file:

```yml
vars:
enable_gcp_bigquery_audit_logs: true
gcp_bigquery_audit_logs_storage_project: 'my-gcp-project'
gcp_bigquery_audit_logs_dataset: 'my_dataset'
gcp_bigquery_audit_logs_table: 'my_table'
# should_combine_audit_logs_and_information_schema: true # Optional, default to false but you might want to combine both sources
```

[You might use environment variable as well](/configuration/package-settings).

66 changes: 66 additions & 0 deletions docs/configuration/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
sidebar_position: 4
slug: /configuration
---

# Configuration

Settings have default values that can be overriden using:

- dbt project variables (and therefore also by CLI variable override)
- environment variables

Please note that the default region is `us` and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and [then replicate the tables to a central region](https://cloud.google.com/bigquery/docs/data-replication) to build an aggregated view.

To know which region is related to a job, in the BQ UI, use the `Job history` (bottom panel), take a job and look at `Location` field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the `Data location` field.

:::tip

To get the best out of this package, you should probably configure all data sources and settings:
- Choose the [Baseline mode](#modes) that fits your GCP setup
- [Add metadata to queries](#add-metadata-to-queries-recommended-but-optional)
- [GCP BigQuery Audit logs](/configuration/audit-logs)
- [GCP Billing export](/configuration/gcp-billing)
- [Settings](/configuration/package-settings) (especially the pricing ones)

:::


## Modes

### Region mode (default)

In this mode, the package will monitor all the GCP projects in the region specified in the `dbt_project.yml` file.

```yml
vars:
# dbt bigquery monitoring vars
bq_region: 'us'
```
**Requirements**
- Execution project needs to be the same as the storage project else you'll need to use the second mode.
- If you have multiple GCP Projects in the same region, you should use the "project mode" (with `input_gcp_projects` setting to specify them) as else you will run into errors such as: `Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS"`.

### Project mode

To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:

```yml
vars:
# dbt bigquery monitoring vars
input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
```

## Add metadata to queries (Recommended but optional)

To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
To configure the query comments, add the following config to `dbt_project.yml`.

```yaml
query-comment:
comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
job-label: True # Use query comment JSON as job labels
```

19 changes: 19 additions & 0 deletions docs/configuration/gcp-billing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
sidebar_position: 4.3
slug: /configuration/gcp-billing
---

# GCP Billing export
GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.

To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.

Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:

```yml
vars:
enable_gcp_billing_export: true
gcp_billing_export_storage_project: 'my-gcp-project'
gcp_billing_export_dataset: 'my_dataset'
gcp_billing_export_table: 'my_table'
```
118 changes: 13 additions & 105 deletions docs/configuration.md → docs/configuration/package-settings.md
Original file line number Diff line number Diff line change
@@ -1,117 +1,21 @@
---
sidebar_position: 4
slug: /configuration
sidebar_position: 4.4
slug: /configuration/package-settings
---

# Configuration

Settings have default values that can be overriden using:

- dbt project variables (and therefore also by CLI variable override)
- environment variables

Please note that the default region is `us` and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and [then replicate the tables to a central region](https://cloud.google.com/bigquery/docs/data-replication) to build an aggregated view.

To know which region is related to a job, in the BQ UI, use the `Job history` (bottom panel), take a job and look at `Location` field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the `Data location` field.

## Modes

### Region mode (default)

In this mode, the package will monitor all the GCP projects in the region specified in the `dbt_project.yml` file.

```yml
vars:
# dbt bigquery monitoring vars
bq_region: 'us'
```
**Requirements**
- Execution project needs to be the same as the storage project else you'll need to use the second mode.
- If you have multiple GCP Projects in the same region, you should use the "project mode" (with `input_gcp_projects` setting to specify them) as else you will run into errors such as: `Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS"`.

### Project mode

To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:

```yml
vars:
# dbt bigquery monitoring vars
input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
```

##### GCP Billing export
GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:

```yml
vars:
enable_gcp_bigquery_audit_logs: true
gcp_bigquery_audit_logs_storage_project: 'my-gcp-project'
gcp_bigquery_audit_logs_dataset: 'my_dataset'
gcp_bigquery_audit_logs_table: 'my_table'
```



### BigQuery audit logs mode

In this mode, the package will monitor all the jobs that written to a GCP BigQuery Audit logs table instead of using `INFORMATION_SCHEMA.JOBS` one.

To enable the "cloud audit logs mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:

```yml
vars:
# dbt bigquery monitoring vars
bq_region: 'us'
cloud_audit_logs_table: 'my-gcp-project.my_dataset.my_table'
```

[You might use environment variable as well](#gcp-bigquery-audit-logs-configuration).

### GCP Billing export

GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.

To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.

Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:

```yml
vars:
# dbt bigquery monitoring vars
enable_gcp_billing_export: true
gcp_billing_export_storage_project: 'my-gcp-project'
gcp_billing_export_dataset: 'my_dataset'
gcp_billing_export_table: 'my_table'
```

## Add metadata to queries (Recommended but optional)

To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
To configure the query comments, add the following config to `dbt_project.yml`.

```yaml
query-comment:
comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
job-label: True # Use query comment JSON as job labels
```

## Customizing the package configuration
# Customizing the package settings

Following settings can be overriden to customize the package configuration.
To do so, you can set the following variables in your `dbt_project.yml` file or use environment variables.

### Environment
## Environment

| Variable | Environment Variable | Description | Default |
|----------|-------------------|-------------|---------|
| `input_gcp_projects` | `DBT_BQ_MONITORING_GCP_PROJECTS` | List of GCP projects to monitor | `[]` |
| `bq_region` | `DBT_BQ_MONITORING_REGION` | Region where the monitored projects are located | `us` |

### Pricing
## Pricing

| Variable | Environment Variable | Description | Default |
|----------|-------------------|-------------|---------|
Expand All @@ -126,7 +30,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
| `bi_engine_gb_hourly_price` | `DBT_BQ_MONITORING_BI_ENGINE_GB_HOURLY_PRICE` | Hourly price in US dollars per BI engine GB of memory | `0.0416` |
| `free_storage_gb_per_month` | `DBT_BQ_MONITORING_FREE_STORAGE_GB_PER_MONTH` | Free storage GB per month | `10` |

### Package
## Package

These settings are used to configure how dbt will run and materialize the models.

| Variable | Environment Variable | Description | Default |
|----------|-------------------|-------------|---------|
Expand All @@ -136,7 +42,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
| `output_partition_expiration_days` | `DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE` | Default table expiration in days for incremental models | `365` days |
| `use_copy_partitions` | `DBT_BQ_MONITORING_USE_COPY_PARTITIONS` | Whether to use copy partitions or not | `true` |

#### GCP Billing export configuration
### GCP Billing export configuration

See [GCP Billing export](/configuration/gcp-billing) for more information.

| Variable | Environment Variable | Description | Default |
|----------|-------------------|-------------|---------|
Expand All @@ -145,9 +53,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
| `gcp_billing_export_dataset` | `DBT_BQ_MONITORING_GCP_BILLING_EXPORT_DATASET` | The dataset for GCP billing export data | `'placeholder'` if enabled, `None` otherwise |
| `gcp_billing_export_table` | `DBT_BQ_MONITORING_GCP_BILLING_EXPORT_TABLE` | The table for GCP billing export data | `'placeholder'` if enabled, `None` otherwise |

#### GCP BigQuery Audit logs configuration
### GCP BigQuery Audit logs configuration

See [GCP BigQuery Audit logs](#bigquery-audit-logs-mode) for more information.
See [GCP BigQuery Audit logs](/configuration/audit-logs) for more information.

| Variable | Environment Variable | Description | Default |
|----------|-------------------|-------------|---------|
Expand Down
22 changes: 6 additions & 16 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ slug: /contributing
You're free to use the environment management tools you prefer but if you're familiar with those, you can use the following:

- pipx (to isolate the global tools from your local environment)
- tox (to run the tests)
- pre-commit (to run the linter)
- SQLFluff (to lint SQL)
- changie (to generate CHANGELOG entries)

Expand All @@ -27,20 +25,12 @@ pipx ensurepath
Then you'll be able to install tox, pre-commit and sqlfluff with pipx:

```bash
pipx install tox
pipx install pre-commit
pipx install sqlfluff
```

To install changie, there are few options depending on your OS.
See the [installation guide](https://changie.dev/guide/installation/) for more details.

To configure pre-commit hooks:

```bash
pre-commit install
```

To configure your dbt profile, run following command and follow the prompts:

```bash
Expand All @@ -52,7 +42,7 @@ dbt init
- Fork the repo
- Create a branch from `main`
- Make your changes
- Run `tox` to run the tests
- Run the tests
- Create your changelog entry with `changie new` (don't edit directly the CHANGELOG.md)
- Commit your changes (it will run the linter through pre-commit)
- Push your branch and open a PR on the repository
Expand All @@ -71,27 +61,27 @@ We use SQLFluff to keep SQL style consistent. By installing `pre-commit` per the

Lint all models in the /models directory:
```bash
tox -e lint_all
sqlfluff lint
```

Fix all models in the /models directory:
```bash
tox -e fix_all
sqlfluff fix
```

Lint (or subsitute lint to fix) a specific model:
```bash
tox -e lint -- models/path/to/model.sql
sqlfluff lint -- models/path/to/model.sql
```

Lint (or subsitute lint to fix) a specific directory:
```bash
tox -e lint -- models/path/to/directory
sqlfluff lint -- models/path/to/directory
```

#### Rules

Enforced rules are defined within `tox.ini`. To view the full list of available rules and their configuration, see the [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/rules.html).
Enforced rules are defined within `.sqlfluff`. To view the full list of available rules and their configuration, see the [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/rules.html).

## Generation of dbt base google models

Expand Down

0 comments on commit 89f14d4

Please sign in to comment.