-
Notifications
You must be signed in to change notification settings - Fork 2
/
cloudquery-sync.yaml
78 lines (70 loc) · 2.7 KB
/
cloudquery-sync.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
id: cloudquery-sync
namespace: company.team
tasks:
- id: hn_to_duckdb
type: io.kestra.plugin.cloudquery.Sync
env:
CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
incremental: false
configs:
- kind: source
spec:
name: hackernews
path: cloudquery/hackernews
version: v3.0.13
tables:
- "*"
destinations:
- duckdb
spec:
item_concurrency: 100
start_time: "{{ trigger.date ?? execution.startDate | dateAdd(-1, 'DAYS') }}"
- kind: destination
spec:
name: duckdb
path: cloudquery/duckdb
version: v4.2.10
write_mode: overwrite-delete-stale
spec:
connection_string: hn.db
triggers:
- id: schedule
type: io.kestra.plugin.core.trigger.Schedule
cron: "@daily"
timezone: US/Eastern
extend:
title: Schedule a CloudQuery data ingestion sync with kestra
description: >-
This flow will start a batch job to sync data from HackerNews to DuckDB with
CloudQuery. The sync process relies on CloudQuery source and destination
plugins. The source plugin will extract data from HackerNews and the
destination plugin will load it to DuckDB.
This flow uses Kestra templating to dynamically configure the start time of
the sync process to be 1 day before the scheduled date, allowing you to also
backfill data for past intervals in small batches.
Alternatively, you can toggle the `incremental` flag to `true` to enable
incremental sync. During incremental syncs, Kestra will store the CloudQuery
cursor in the Kestra backend, and will use it to always start the sync
process from the last cursor position.
To get started configuring CloudQuery sources and destinations, go to
[CloudQuery Integrations](https://www.cloudquery.io/integrations) page. From
here, you can select your desired source and destination, and you'll see the
YAML configuration that you can copy and paste into your own file, or into
Kestra flow. This page will also provide more detailed instructions and
references about tables available in each plugin.
Note that you can [generate an API
key](https://docs.cloudquery.io/docs/deployment/generate-api-key) to use
premium plugins. You can add the API key as an environment variable:
```yaml
- id: hn_to_duckdb
type: io.kestra.plugin.cloudquery.Sync
env:
CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
```
tags:
- Ingest
- CLI
ee: false
demo: true
meta_description: This flow will start a batch job to sync data from HackerNews
to DuckDB with CloudQuery.