Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the installation and configuration documentation for the Huawei GaussDB and GaussDB(DWS) adapter dbt-gaussdbdws #6619

Open
wants to merge 4 commits into
base: current
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ website/i18n/*

# Local Vercel folder
.vercel

gitpush.sh
150 changes: 150 additions & 0 deletions website/docs/docs/core/connect-data-platform/gaussdbdws-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
title: "Gaussdb(DWS) setup"
description: "Read this guide to learn about the Gaussdb(DWS) warehouse setup in dbt."
id: "gaussdbdws-setup"
meta:
maintained_by: dbt Labs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update this - this should be changed over to you

authors: 'core dbt maintainers'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not true anymore

github_repo: 'n/a'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link to the github repo?

pypi_package: 'dbt-gaussdbdws'
min_core_version: 'v0.4.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is incorrect. What is the earliest version you support?

cloud_support: Not supported
min_supported_version: 'n/a'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there versions of GaussDB that are not supported?

slack_channel_name: 'n/a'
slack_channel_link: 'n/a'
platform_name: 'Gaussdb(DWS)'
config_page: '/reference/resource-configs/gaussdbdws-configs'
---

<Snippet path="warehouse-setups-cloud-callout" />

import SetUpPages from '/snippets/_setup-pages-intro.md';

<SetUpPages meta={frontMatter.meta} />


## Profile Configuration

Gaussdb(DWS) targets should be set up using the following configuration in your `profiles.yml` file.

<File name='~/.dbt/profiles.yml'>

```yaml
company-name:
target: dev
outputs:
dev:
type: gaussdbdws
host: [hostname]
user: [username]
password: [password]
port: [port]
dbname: [database name] # or database instead of dbname
schema: [dbt schema]
threads: [optional, 1 or more]
[keepalives_idle](#keepalives_idle): 0 # default 0, indicating the system default. See below
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[keepalives_idle](#keepalives_idle): 0 # default 0, indicating the system default. See below
keepalives_idle: 0 # default 0, indicating the system default. See below

connect_timeout: 10 # default 10 seconds
[retries](#retries): 1 # default 1 retry on error/timeout when opening connections
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[retries](#retries): 1 # default 1 retry on error/timeout when opening connections
retries: 1 # default 1 retry on error/timeout when opening connections

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please take a look at your profiles.yml sample file? I think your fields are coming out a bit funny here. Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this:
jaffle_shop:
target: dev_dws
outputs:
dev_dws:
type: gaussdbdws
host: 121.xxx.xxx.199
user: xxx
password: xxx@123
port: 8000
dbname: gaussdb
schema: xxx
threads: 1

[search_path](#search_path): [optional, override the default gaussdbdws search_path]
[role](#role): [optional, set the role dbt assumes when executing queries]
[sslmode](#sslmode): [optional, set the sslmode used to connect to the database]
[sslcert](#sslcert): [optional, set the sslcert to control the certifcate file location]
[sslkey](#sslkey): [optional, set the sslkey to control the location of the private key]
[sslrootcert](#sslrootcert): [optional, set the sslrootcert config value to a new file path in order to customize the file location that contain root certificates]

```

</File>

### Configurations

#### search_path

The `search_path` config controls the Gaussdb(DWS) "search path" that dbt configures when opening new connections to the database. By default, the Gaussdb(DWS) search path is `"$user, public"`, meaning that unqualified <Term id="table" /> names will be searched for in the `public` schema, or a schema with the same name as the logged-in user. **Note:** Setting the `search_path` to a custom value is not necessary or recommended for typical usage of dbt.

#### role

The `role` config controls the Gaussdb(DWS) role that dbt assumes when opening new connections to the database.

#### sslmode

The `sslmode` config controls how dbt connectes to Gaussdb(DWS) databases using SSL. See [the Gaussdb(DWS) docs](https://support.huaweicloud.com/tg-dws/dws_gsql_011.html) on `sslmode` for usage information. When unset, dbt will connect to databases using the Gaussdb(DWS) default, `prefer`, as the `sslmode`.


#### sslcert

The `sslcert` config controls the location of the certificate file used to connect to Gaussdb(DWS) when using client SSL connections. To use a certificate file that is not in the default location, set that file path using this value. Without this config set, dbt uses the Gaussdb(DWS) default locations. See [Client Certificates](https://support.huaweicloud.com/tg-dws/dws_gsql_011.html) in the Gaussdb(DWS) SSL docs for the default paths.

#### sslkey

The `sslkey` config controls the location of the private key for connecting to Gaussdb(DWS) using client SSL connections. If this config is omitted, dbt uses the default key location for Gaussdb(DWS). See [Client Certificates](https://support.huaweicloud.com/tg-dws/dws_gsql_011.html) in the Gaussdb(DWS) SSL docs for the default locations.

#### sslrootcert

When connecting to a Gaussdb(DWS) server using a client SSL connection, dbt verifies that the server provides an SSL certificate signed by a trusted root certificate. These root certificates are in the `/home/dbadmin/dws_ssl/sslcert/certca.pem` file by default. To customize the location of this file, set the `sslrootcert` config value to a new file path.

### `keepalives_idle`
If the database closes its connection while dbt is waiting for data, you may see the error `SSL SYSCALL error: EOF detected`. Lowering the [`keepalives_idle` value](https://www.postgresql.org/docs/9.3/libpq-connect.html) may prevent this, because the server will send a ping to keep the connection active more frequently.

[dbt's default setting](https://github.com/dbt-labs/dbt-core/blob/main/plugins/gaussdbdws/dbt/adapters/gaussdbdws/connections.py#L28) is 0 (the server's default value), but can be configured lower (perhaps 120 or 60 seconds), at the cost of a chattier network connection.


#### retries

If `dbt-gaussdbdws` encounters an operational error or timeout when opening a new connection, it will retry up to the number of times configured by `retries`. The default value is 3 retry. If set to 2+ retries, dbt will wait 1 second before retrying. If set to 0, dbt will not retry at all.


### `psycopg2-binary` vs. `psycopg2`

By default, `dbt-gaussdbdws` installs `psycopg2-binary`. This is great for development, and even testing, as it does not require any OS dependencies; it's a pre-built wheel. However, building `psycopg2` from source will grant performance improvements that are desired in a production environment. In order to install `psycopg2`, use the following steps:

```bash
if [[ $(pip show psycopg2-binary) ]]; then
PSYCOPG2_VERSION=$(pip show psycopg2-binary | grep Version | cut -d " " -f 2)
pip uninstall -y psycopg2-binary
pip install psycopg2==$PSYCOPG2_VERSION
fi
```

This ensures the version of `psycopg2` will match that of `psycopg2-binary`.
**Note:** The native PostgreSQL driver cannot connect to GaussDB(DWS) directly. If you need to use the PostgreSQL native driver, you must set `password_encryption_type: 1` (compatibility mode supporting both MD5 and SHA256) to enable the PostgreSQL native driver.

### `GaussDB psycopg2`
It is recommended to use the following approach: GaussDB uses SHA256 as the default encryption method for user passwords, while the PostgreSQL native driver defaults to MD5 for password encryption. Follow the steps below to prepare the required drivers and dependencies and load the driver.

1.You can obtain the required package from the release bundle. The package is named as:
`GaussDB-Kernel_<database_version>_<OS_version>_64bit_Python.tar.gz`.
- psycopg2:Contains the psycopg2 library files.
- lib:Contains the psycopg2 library files.

2.Follow the steps below to load the driver:
```bash
# Extract the driver package, for example: GaussDB-Kernel_xxx.x.x_Hce_64bit_Python.tar.gz
tar -zxvf GaussDB-Kernel_xxx.x.x_Hce_64bit_Python.tar.gz

# Uninstall psycopg2-binary
pip uninstall -y psycopg2-binary

# Install psycopg2 by copying it to the site-packages directory of the Python installation using the root user
cp psycopg2 $(python3 -c 'import site; print(site.getsitepackages()[0])') -r

# Grant permissions
chmod 755 $(python3 -c 'import site; print(site.getsitepackages()[0])')/psycopg2 -R

# Verify the existence of the psycopg2 directory
ls -ltr $(python3 -c 'import site; print(site.getsitepackages()[0])') | grep psycopg2

# To add the psycopg2 directory to the $PYTHONPATH environment variable and make it effective
export PYTHONPATH=$(python3 -c 'import site; print(site.getsitepackages()[0])'):$PYTHONPATH

# For non-database users, you need to add the extracted lib directory to the LD_LIBRARY_PATH environment variable
export LD_LIBRARY_PATH=/root/lib:$LD_LIBRARY_PATH

# To verify that the configuration is correct and there are no errors
(.venv) [root@ecs-euleros-dev ~]# python3
Python 3.9.9 (main, Jun 19 2024, 02:50:21)
[GCC 10.3.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psycopg2
>>> exit()
```
220 changes: 220 additions & 0 deletions website/docs/reference/resource-configs/gaussdbdws-configs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
---
title: "GaussDB(DWS) configurations"
description: "GaussDB(DWS) Configurations - Read this in-depth guide to learn about configurations in dbt."
id: "GaussDB(DWS)-configs"
---

## Incremental materialization strategies

In dbt-gaussdbdws, the following incremental materialization strategies are supported:

- `append` (default when `unique_key` is not defined)
- `merge`
- `delete+insert` (default when `unique_key` is defined)
- [`microbatch`](/docs/build/incremental-microbatch)

## Performance optimizations

### Unlogged

If this keyword `Unlogged` is specified, the created table will be an unlogged table. Data written to an unlogged table is not recorded in the write-ahead log (WAL), making it significantly faster than regular tables. However, unlogged tables are automatically truncated in the event of conflicts, operating system reboots, database restarts, primary-secondary failovers, power interruptions, or unexpected shutdowns, posing a risk of data loss. Additionally, the contents of unlogged tables are not replicated to standby servers. Indexes created on unlogged tables are also not automatically logged.

#### Use Case

Unlogged tables cannot guarantee data safety. Users should use them only after ensuring data backups are in place. For example, they can be used to back up data during system upgrades.

#### Failure Handling

In the event of unexpected shutdowns or similar operations leading to data loss in indexes on unlogged tables, users should rebuild the affected indexes.

See [GaussDB docs](https://support.huaweicloud.com/distributed-devg-v8-gaussdb/gaussdb-12-0567.html) , [GaussDB(DWS) docs](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0177.html) for details.

<File name='my_table.sql'>

```sql
{{ config(materialized='table', unlogged=True) }}

select ...
```

</File>

<File name='dbt_project.yml'>

```yaml
models:
+unlogged: true
```

</File>

### Indexes

Indexes can improve database query performance, but improper use may lead to a decline in database performance. It is recommended to create indexes only when one of the following principles is met:

- Fields that are frequently queried.
- Create indexes on join conditions. For queries involving multi-column joins, it is recommended to create composite indexes on those columns. For example, for the query `SELECT * FROM t1 JOIN t2 ON t1.a = t2.a AND t1.b = t2.b`, you can create a composite index on columns a and b of table t1.
- Fields used in the `WHERE` clause as filtering conditions (especially range conditions).
- Fields that often appear after `ORDER BY`, `GROUP BY`, and `DISTINCT`.
- For point query scenarios, it is recommended to create a `B-tree` index.
The syntax for creating indexes on partitioned tables is different from that for regular tables. Please note the following: partitioned tables do not support parallel index creation, partial indexes, or the NULL FIRST feature.

Table models, incremental models, seeds, snapshots, and materialized views may have a list of `indexes` defined. Each GaussDB(DWS) index can have three components:
- `columns` (list, required): one or more columns on which the index is defined
- `unique` (boolean, optional): whether the index should be [declared unique](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0165.html)
- `type` (string, optional): a supported [index type](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0165.html) (B-tree, Hash, GIN, etc)

<File name='my_table.sql'>

```sql
{{ config(
materialized = 'table',
indexes=[
{'columns': ['column_a'], 'type': 'hash'},
{'columns': ['column_a', 'column_b'], 'unique': True},
]
)}}

select ...
```

</File>

If one or more indexes are configured on a resource, dbt will run `create index` <Term id="ddl" /> statement(s) as part of that resource's <Term id="materialization" />, within the same transaction as its main `create` statement. For the index's name, dbt uses a hash of its properties and the current timestamp, in order to guarantee uniqueness and avoid namespace conflict with other indexes.

```sql
create index if not exists
"7f8e3c2b0a4e9176d82b5c913f4a621c"
on "my_target_database"."my_target_schema"."indexed_model"
using hash
(column_a);

create unique index if not exists
"bf1348a72e56dc9f08c43a15d0a1e759"
on "my_target_database"."my_target_schema"."indexed_model"
(column_a, column_b);
```

You can also configure indexes for a number of resources at once:

<File name='dbt_project.yml'>

```yaml
models:
project_name:
subdirectory:
+indexes:
- columns: ['column_a']
type: hash
```

</File>

## Materialized views

The GaussDB(DWS) adapter supports materialized views.

**Notes**:

- The base tables for materialized views can be row-store tables, column-store tables, hstore tables, partitioned tables (or specific partitions), external tables, or other materialized views. Temporary tables (including global temporary tables, volatile temporary tables, and regular temporary tables) are not supported. Cold-hot tables (supported in version 910.200 and above) are supported, but automatic partition tables with specified partitions are not.
- Materialized views prohibit `INSERT`, `UPDATE`, `MERGE INTO`, and `DELETE` operations for data modification.
Materialized views execute once and store the results, ensuring consistent query results. After `BUILD IMMEDIATE` or `REFRESH`, materialized views provide accurate results.
- Materialized views cannot specify a Node Group via syntax. Base tables of materialized views can specify a Node Group during creation, and materialized views will inherit the Node Group information from the base table. The Node Groups for multiple base tables must be the same.
- Creating a materialized view requires `CREATE` permissions on the schema and `SELECT` permissions on the base table or columns.
- Querying a materialized view requires `SELECT` permissions on the materialized view.
- Refreshing a materialized view requires INSERT permissions on the materialized view and `SELECT` permissions on the base table or columns.
- Materialized views support fine-grained permissions like `ANALYZE`, `VACUUM`, `ALTER`, and `DROP`.
- Materialized views support permission delegation operations with the `WITH GRANT OPTION`.
- Materialized views do not support advanced security controls. If the base table has row-level security (RLS), data masking policies, or its owner is a private user with restricted `SELECT` permissions, creating a materialized view is prohibited. If a materialized view already exists and the base table adds RLS, masking policies, or changes its owner to a private user, the materialized view can still execute queries but cannot be refreshed.


with the following configuration parameters:

| Parameter | Type | Required | Default | Change Monitoring Support |
|----------------------------------------------------------------------------------|--------------------|----------|---------|---------------------------|
| [`on_configuration_change`](/reference/resource-configs/on_configuration_change) | `<string>` | no | `apply` | n/a |
| [`indexes`](#indexes) | `[{<dictionary>}]` | no | `none` | alter |

<Tabs
groupId="config-languages"
defaultValue="project-yaml"
values={[
{ label: 'Project file', value: 'project-yaml', },
{ label: 'Property file', value: 'property-yaml', },
{ label: 'Config block', value: 'config', },
]
}>


<TabItem value="project-yaml">

<File name='dbt_project.yml'>

```yaml
models:
[<resource-path>](/reference/resource-configs/resource-path):
[+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): materialized_view
[+](/reference/resource-configs/plus-prefix)[on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail
[+](/reference/resource-configs/plus-prefix)[indexes](#indexes):
- columns: [<column-name>]
unique: true | false
type: hash | btree
```

</File>

</TabItem>


<TabItem value="property-yaml">

<File name='models/properties.yml'>

```yaml
version: 2

models:
- name: [<model-name>]
config:
[materialized](/reference/resource-configs/materialized): materialized_view
[on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail
[indexes](#indexes):
- columns: [<column-name>]
unique: true | false
type: hash | btree
```

</File>

</TabItem>


<TabItem value="config">

<File name='models/<model_name>.sql'>

```jinja
{{ config(
[materialized](/reference/resource-configs/materialized)="materialized_view",
[on_configuration_change](/reference/resource-configs/on_configuration_change)="apply" | "continue" | "fail",
[indexes](#indexes)=[
{
"columns": ["<column-name>"],
"unique": true | false,
"type": "hash" | "btree",
}
]
) }}
```

</File>

</TabItem>

</Tabs>

The [`indexes`](#indexes) parameter corresponds to that of a table, as explained above.
It's worth noting that, unlike tables, dbt monitors this parameter for changes and applies the changes without dropping the materialized view.
This happens via a `DROP/CREATE` of the indexes, which can be thought of as an `ALTER` of the materialized view.

Learn more about these parameters in GaussDB(DWS)'s [CREATE MATERIALIZED VIEW](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0357.html) .
Loading