-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the installation and configuration documentation for the Huawei GaussDB and GaussDB(DWS) adapter dbt-gaussdbdws #6619
base: current
Are you sure you want to change the base?
Changes from all commits
f42293b
435eb85
59dfdbb
3c5c82e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,3 +22,5 @@ website/i18n/* | |
|
||
# Local Vercel folder | ||
.vercel | ||
|
||
gitpush.sh |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,150 @@ | ||||||
--- | ||||||
title: "Gaussdb(DWS) setup" | ||||||
description: "Read this guide to learn about the Gaussdb(DWS) warehouse setup in dbt." | ||||||
id: "gaussdbdws-setup" | ||||||
meta: | ||||||
maintained_by: dbt Labs | ||||||
authors: 'core dbt maintainers' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not true anymore |
||||||
github_repo: 'n/a' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you link to the github repo? |
||||||
pypi_package: 'dbt-gaussdbdws' | ||||||
min_core_version: 'v0.4.0' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure this is incorrect. What is the earliest version you support? |
||||||
cloud_support: Not supported | ||||||
min_supported_version: 'n/a' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there versions of GaussDB that are not supported? |
||||||
slack_channel_name: 'n/a' | ||||||
slack_channel_link: 'n/a' | ||||||
platform_name: 'Gaussdb(DWS)' | ||||||
config_page: '/reference/resource-configs/gaussdbdws-configs' | ||||||
--- | ||||||
|
||||||
<Snippet path="warehouse-setups-cloud-callout" /> | ||||||
|
||||||
import SetUpPages from '/snippets/_setup-pages-intro.md'; | ||||||
|
||||||
<SetUpPages meta={frontMatter.meta} /> | ||||||
|
||||||
|
||||||
## Profile Configuration | ||||||
|
||||||
Gaussdb(DWS) targets should be set up using the following configuration in your `profiles.yml` file. | ||||||
|
||||||
<File name='~/.dbt/profiles.yml'> | ||||||
|
||||||
```yaml | ||||||
company-name: | ||||||
target: dev | ||||||
outputs: | ||||||
dev: | ||||||
type: gaussdbdws | ||||||
host: [hostname] | ||||||
user: [username] | ||||||
password: [password] | ||||||
port: [port] | ||||||
dbname: [database name] # or database instead of dbname | ||||||
schema: [dbt schema] | ||||||
threads: [optional, 1 or more] | ||||||
[keepalives_idle](#keepalives_idle): 0 # default 0, indicating the system default. See below | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
connect_timeout: 10 # default 10 seconds | ||||||
[retries](#retries): 1 # default 1 retry on error/timeout when opening connections | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please take a look at your profiles.yml sample file? I think your fields are coming out a bit funny here. Thank you! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. like this: |
||||||
[search_path](#search_path): [optional, override the default gaussdbdws search_path] | ||||||
[role](#role): [optional, set the role dbt assumes when executing queries] | ||||||
[sslmode](#sslmode): [optional, set the sslmode used to connect to the database] | ||||||
[sslcert](#sslcert): [optional, set the sslcert to control the certifcate file location] | ||||||
[sslkey](#sslkey): [optional, set the sslkey to control the location of the private key] | ||||||
[sslrootcert](#sslrootcert): [optional, set the sslrootcert config value to a new file path in order to customize the file location that contain root certificates] | ||||||
|
||||||
``` | ||||||
|
||||||
</File> | ||||||
|
||||||
### Configurations | ||||||
|
||||||
#### search_path | ||||||
|
||||||
The `search_path` config controls the Gaussdb(DWS) "search path" that dbt configures when opening new connections to the database. By default, the Gaussdb(DWS) search path is `"$user, public"`, meaning that unqualified <Term id="table" /> names will be searched for in the `public` schema, or a schema with the same name as the logged-in user. **Note:** Setting the `search_path` to a custom value is not necessary or recommended for typical usage of dbt. | ||||||
|
||||||
#### role | ||||||
|
||||||
The `role` config controls the Gaussdb(DWS) role that dbt assumes when opening new connections to the database. | ||||||
|
||||||
#### sslmode | ||||||
|
||||||
The `sslmode` config controls how dbt connectes to Gaussdb(DWS) databases using SSL. See [the Gaussdb(DWS) docs](https://support.huaweicloud.com/tg-dws/dws_gsql_011.html) on `sslmode` for usage information. When unset, dbt will connect to databases using the Gaussdb(DWS) default, `prefer`, as the `sslmode`. | ||||||
|
||||||
|
||||||
#### sslcert | ||||||
|
||||||
The `sslcert` config controls the location of the certificate file used to connect to Gaussdb(DWS) when using client SSL connections. To use a certificate file that is not in the default location, set that file path using this value. Without this config set, dbt uses the Gaussdb(DWS) default locations. See [Client Certificates](https://support.huaweicloud.com/tg-dws/dws_gsql_011.html) in the Gaussdb(DWS) SSL docs for the default paths. | ||||||
|
||||||
#### sslkey | ||||||
|
||||||
The `sslkey` config controls the location of the private key for connecting to Gaussdb(DWS) using client SSL connections. If this config is omitted, dbt uses the default key location for Gaussdb(DWS). See [Client Certificates](https://support.huaweicloud.com/tg-dws/dws_gsql_011.html) in the Gaussdb(DWS) SSL docs for the default locations. | ||||||
|
||||||
#### sslrootcert | ||||||
|
||||||
When connecting to a Gaussdb(DWS) server using a client SSL connection, dbt verifies that the server provides an SSL certificate signed by a trusted root certificate. These root certificates are in the `/home/dbadmin/dws_ssl/sslcert/certca.pem` file by default. To customize the location of this file, set the `sslrootcert` config value to a new file path. | ||||||
|
||||||
### `keepalives_idle` | ||||||
If the database closes its connection while dbt is waiting for data, you may see the error `SSL SYSCALL error: EOF detected`. Lowering the [`keepalives_idle` value](https://www.postgresql.org/docs/9.3/libpq-connect.html) may prevent this, because the server will send a ping to keep the connection active more frequently. | ||||||
|
||||||
[dbt's default setting](https://github.com/dbt-labs/dbt-core/blob/main/plugins/gaussdbdws/dbt/adapters/gaussdbdws/connections.py#L28) is 0 (the server's default value), but can be configured lower (perhaps 120 or 60 seconds), at the cost of a chattier network connection. | ||||||
|
||||||
|
||||||
#### retries | ||||||
|
||||||
If `dbt-gaussdbdws` encounters an operational error or timeout when opening a new connection, it will retry up to the number of times configured by `retries`. The default value is 3 retry. If set to 2+ retries, dbt will wait 1 second before retrying. If set to 0, dbt will not retry at all. | ||||||
|
||||||
|
||||||
### `psycopg2-binary` vs. `psycopg2` | ||||||
|
||||||
By default, `dbt-gaussdbdws` installs `psycopg2-binary`. This is great for development, and even testing, as it does not require any OS dependencies; it's a pre-built wheel. However, building `psycopg2` from source will grant performance improvements that are desired in a production environment. In order to install `psycopg2`, use the following steps: | ||||||
|
||||||
```bash | ||||||
if [[ $(pip show psycopg2-binary) ]]; then | ||||||
PSYCOPG2_VERSION=$(pip show psycopg2-binary | grep Version | cut -d " " -f 2) | ||||||
pip uninstall -y psycopg2-binary | ||||||
pip install psycopg2==$PSYCOPG2_VERSION | ||||||
fi | ||||||
``` | ||||||
|
||||||
This ensures the version of `psycopg2` will match that of `psycopg2-binary`. | ||||||
**Note:** The native PostgreSQL driver cannot connect to GaussDB(DWS) directly. If you need to use the PostgreSQL native driver, you must set `password_encryption_type: 1` (compatibility mode supporting both MD5 and SHA256) to enable the PostgreSQL native driver. | ||||||
|
||||||
### `GaussDB psycopg2` | ||||||
It is recommended to use the following approach: GaussDB uses SHA256 as the default encryption method for user passwords, while the PostgreSQL native driver defaults to MD5 for password encryption. Follow the steps below to prepare the required drivers and dependencies and load the driver. | ||||||
|
||||||
1.You can obtain the required package from the release bundle. The package is named as: | ||||||
`GaussDB-Kernel_<database_version>_<OS_version>_64bit_Python.tar.gz`. | ||||||
- psycopg2:Contains the psycopg2 library files. | ||||||
- lib:Contains the psycopg2 library files. | ||||||
|
||||||
2.Follow the steps below to load the driver: | ||||||
```bash | ||||||
# Extract the driver package, for example: GaussDB-Kernel_xxx.x.x_Hce_64bit_Python.tar.gz | ||||||
tar -zxvf GaussDB-Kernel_xxx.x.x_Hce_64bit_Python.tar.gz | ||||||
|
||||||
# Uninstall psycopg2-binary | ||||||
pip uninstall -y psycopg2-binary | ||||||
|
||||||
# Install psycopg2 by copying it to the site-packages directory of the Python installation using the root user | ||||||
cp psycopg2 $(python3 -c 'import site; print(site.getsitepackages()[0])') -r | ||||||
|
||||||
# Grant permissions | ||||||
chmod 755 $(python3 -c 'import site; print(site.getsitepackages()[0])')/psycopg2 -R | ||||||
|
||||||
# Verify the existence of the psycopg2 directory | ||||||
ls -ltr $(python3 -c 'import site; print(site.getsitepackages()[0])') | grep psycopg2 | ||||||
|
||||||
# To add the psycopg2 directory to the $PYTHONPATH environment variable and make it effective | ||||||
export PYTHONPATH=$(python3 -c 'import site; print(site.getsitepackages()[0])'):$PYTHONPATH | ||||||
|
||||||
# For non-database users, you need to add the extracted lib directory to the LD_LIBRARY_PATH environment variable | ||||||
export LD_LIBRARY_PATH=/root/lib:$LD_LIBRARY_PATH | ||||||
|
||||||
# To verify that the configuration is correct and there are no errors | ||||||
(.venv) [root@ecs-euleros-dev ~]# python3 | ||||||
Python 3.9.9 (main, Jun 19 2024, 02:50:21) | ||||||
[GCC 10.3.1] on linux | ||||||
Type "help", "copyright", "credits" or "license" for more information. | ||||||
>>> import psycopg2 | ||||||
>>> exit() | ||||||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
--- | ||
title: "GaussDB(DWS) configurations" | ||
description: "GaussDB(DWS) Configurations - Read this in-depth guide to learn about configurations in dbt." | ||
id: "GaussDB(DWS)-configs" | ||
--- | ||
|
||
## Incremental materialization strategies | ||
|
||
In dbt-gaussdbdws, the following incremental materialization strategies are supported: | ||
|
||
- `append` (default when `unique_key` is not defined) | ||
- `merge` | ||
- `delete+insert` (default when `unique_key` is defined) | ||
- [`microbatch`](/docs/build/incremental-microbatch) | ||
|
||
## Performance optimizations | ||
|
||
### Unlogged | ||
|
||
If this keyword `Unlogged` is specified, the created table will be an unlogged table. Data written to an unlogged table is not recorded in the write-ahead log (WAL), making it significantly faster than regular tables. However, unlogged tables are automatically truncated in the event of conflicts, operating system reboots, database restarts, primary-secondary failovers, power interruptions, or unexpected shutdowns, posing a risk of data loss. Additionally, the contents of unlogged tables are not replicated to standby servers. Indexes created on unlogged tables are also not automatically logged. | ||
|
||
#### Use Case | ||
|
||
Unlogged tables cannot guarantee data safety. Users should use them only after ensuring data backups are in place. For example, they can be used to back up data during system upgrades. | ||
|
||
#### Failure Handling | ||
|
||
In the event of unexpected shutdowns or similar operations leading to data loss in indexes on unlogged tables, users should rebuild the affected indexes. | ||
|
||
See [GaussDB docs](https://support.huaweicloud.com/distributed-devg-v8-gaussdb/gaussdb-12-0567.html) , [GaussDB(DWS) docs](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0177.html) for details. | ||
|
||
<File name='my_table.sql'> | ||
|
||
```sql | ||
{{ config(materialized='table', unlogged=True) }} | ||
|
||
select ... | ||
``` | ||
|
||
</File> | ||
|
||
<File name='dbt_project.yml'> | ||
|
||
```yaml | ||
models: | ||
+unlogged: true | ||
``` | ||
|
||
</File> | ||
|
||
### Indexes | ||
|
||
Indexes can improve database query performance, but improper use may lead to a decline in database performance. It is recommended to create indexes only when one of the following principles is met: | ||
|
||
- Fields that are frequently queried. | ||
- Create indexes on join conditions. For queries involving multi-column joins, it is recommended to create composite indexes on those columns. For example, for the query `SELECT * FROM t1 JOIN t2 ON t1.a = t2.a AND t1.b = t2.b`, you can create a composite index on columns a and b of table t1. | ||
- Fields used in the `WHERE` clause as filtering conditions (especially range conditions). | ||
- Fields that often appear after `ORDER BY`, `GROUP BY`, and `DISTINCT`. | ||
- For point query scenarios, it is recommended to create a `B-tree` index. | ||
The syntax for creating indexes on partitioned tables is different from that for regular tables. Please note the following: partitioned tables do not support parallel index creation, partial indexes, or the NULL FIRST feature. | ||
|
||
Table models, incremental models, seeds, snapshots, and materialized views may have a list of `indexes` defined. Each GaussDB(DWS) index can have three components: | ||
- `columns` (list, required): one or more columns on which the index is defined | ||
- `unique` (boolean, optional): whether the index should be [declared unique](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0165.html) | ||
- `type` (string, optional): a supported [index type](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0165.html) (B-tree, Hash, GIN, etc) | ||
|
||
<File name='my_table.sql'> | ||
|
||
```sql | ||
{{ config( | ||
materialized = 'table', | ||
indexes=[ | ||
{'columns': ['column_a'], 'type': 'hash'}, | ||
{'columns': ['column_a', 'column_b'], 'unique': True}, | ||
] | ||
)}} | ||
|
||
select ... | ||
``` | ||
|
||
</File> | ||
|
||
If one or more indexes are configured on a resource, dbt will run `create index` <Term id="ddl" /> statement(s) as part of that resource's <Term id="materialization" />, within the same transaction as its main `create` statement. For the index's name, dbt uses a hash of its properties and the current timestamp, in order to guarantee uniqueness and avoid namespace conflict with other indexes. | ||
|
||
```sql | ||
create index if not exists | ||
"7f8e3c2b0a4e9176d82b5c913f4a621c" | ||
on "my_target_database"."my_target_schema"."indexed_model" | ||
using hash | ||
(column_a); | ||
|
||
create unique index if not exists | ||
"bf1348a72e56dc9f08c43a15d0a1e759" | ||
on "my_target_database"."my_target_schema"."indexed_model" | ||
(column_a, column_b); | ||
``` | ||
|
||
You can also configure indexes for a number of resources at once: | ||
|
||
<File name='dbt_project.yml'> | ||
|
||
```yaml | ||
models: | ||
project_name: | ||
subdirectory: | ||
+indexes: | ||
- columns: ['column_a'] | ||
type: hash | ||
``` | ||
|
||
</File> | ||
|
||
## Materialized views | ||
|
||
The GaussDB(DWS) adapter supports materialized views. | ||
|
||
**Notes**: | ||
|
||
- The base tables for materialized views can be row-store tables, column-store tables, hstore tables, partitioned tables (or specific partitions), external tables, or other materialized views. Temporary tables (including global temporary tables, volatile temporary tables, and regular temporary tables) are not supported. Cold-hot tables (supported in version 910.200 and above) are supported, but automatic partition tables with specified partitions are not. | ||
- Materialized views prohibit `INSERT`, `UPDATE`, `MERGE INTO`, and `DELETE` operations for data modification. | ||
Materialized views execute once and store the results, ensuring consistent query results. After `BUILD IMMEDIATE` or `REFRESH`, materialized views provide accurate results. | ||
- Materialized views cannot specify a Node Group via syntax. Base tables of materialized views can specify a Node Group during creation, and materialized views will inherit the Node Group information from the base table. The Node Groups for multiple base tables must be the same. | ||
- Creating a materialized view requires `CREATE` permissions on the schema and `SELECT` permissions on the base table or columns. | ||
- Querying a materialized view requires `SELECT` permissions on the materialized view. | ||
- Refreshing a materialized view requires INSERT permissions on the materialized view and `SELECT` permissions on the base table or columns. | ||
- Materialized views support fine-grained permissions like `ANALYZE`, `VACUUM`, `ALTER`, and `DROP`. | ||
- Materialized views support permission delegation operations with the `WITH GRANT OPTION`. | ||
- Materialized views do not support advanced security controls. If the base table has row-level security (RLS), data masking policies, or its owner is a private user with restricted `SELECT` permissions, creating a materialized view is prohibited. If a materialized view already exists and the base table adds RLS, masking policies, or changes its owner to a private user, the materialized view can still execute queries but cannot be refreshed. | ||
|
||
|
||
with the following configuration parameters: | ||
|
||
| Parameter | Type | Required | Default | Change Monitoring Support | | ||
|----------------------------------------------------------------------------------|--------------------|----------|---------|---------------------------| | ||
| [`on_configuration_change`](/reference/resource-configs/on_configuration_change) | `<string>` | no | `apply` | n/a | | ||
| [`indexes`](#indexes) | `[{<dictionary>}]` | no | `none` | alter | | ||
|
||
<Tabs | ||
groupId="config-languages" | ||
defaultValue="project-yaml" | ||
values={[ | ||
{ label: 'Project file', value: 'project-yaml', }, | ||
{ label: 'Property file', value: 'property-yaml', }, | ||
{ label: 'Config block', value: 'config', }, | ||
] | ||
}> | ||
|
||
|
||
<TabItem value="project-yaml"> | ||
|
||
<File name='dbt_project.yml'> | ||
|
||
```yaml | ||
models: | ||
[<resource-path>](/reference/resource-configs/resource-path): | ||
[+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): materialized_view | ||
[+](/reference/resource-configs/plus-prefix)[on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail | ||
[+](/reference/resource-configs/plus-prefix)[indexes](#indexes): | ||
- columns: [<column-name>] | ||
unique: true | false | ||
type: hash | btree | ||
``` | ||
|
||
</File> | ||
|
||
</TabItem> | ||
|
||
|
||
<TabItem value="property-yaml"> | ||
|
||
<File name='models/properties.yml'> | ||
|
||
```yaml | ||
version: 2 | ||
|
||
models: | ||
- name: [<model-name>] | ||
config: | ||
[materialized](/reference/resource-configs/materialized): materialized_view | ||
[on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail | ||
[indexes](#indexes): | ||
- columns: [<column-name>] | ||
unique: true | false | ||
type: hash | btree | ||
``` | ||
|
||
</File> | ||
|
||
</TabItem> | ||
|
||
|
||
<TabItem value="config"> | ||
|
||
<File name='models/<model_name>.sql'> | ||
|
||
```jinja | ||
{{ config( | ||
[materialized](/reference/resource-configs/materialized)="materialized_view", | ||
[on_configuration_change](/reference/resource-configs/on_configuration_change)="apply" | "continue" | "fail", | ||
[indexes](#indexes)=[ | ||
{ | ||
"columns": ["<column-name>"], | ||
"unique": true | false, | ||
"type": "hash" | "btree", | ||
} | ||
] | ||
) }} | ||
``` | ||
|
||
</File> | ||
|
||
</TabItem> | ||
|
||
</Tabs> | ||
|
||
The [`indexes`](#indexes) parameter corresponds to that of a table, as explained above. | ||
It's worth noting that, unlike tables, dbt monitors this parameter for changes and applies the changes without dropping the materialized view. | ||
This happens via a `DROP/CREATE` of the indexes, which can be thought of as an `ALTER` of the materialized view. | ||
|
||
Learn more about these parameters in GaussDB(DWS)'s [CREATE MATERIALIZED VIEW](https://support.huaweicloud.com/sqlreference-910-dws/dws_06_0357.html) . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update this - this should be changed over to you