Skip to content

Commit

Permalink
docs: warnings about risks of using incremental (MERGE) replication m…
Browse files Browse the repository at this point in the history
…ethod
  • Loading branch information
RuslanBergenov committed Jan 25, 2022
1 parent f3eb383 commit 67ac76c
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ sample [target-config.json](/sample_config/target-config-exchange-rates-api.json
* `truncate`: Deleting all previous rows and uploading the new ones to the table
* `incremental`: **Upserting** new rows into the table, using the **primary key** given by the tap connector
(if it finds an old row with same key, updates it. Otherwise it inserts the new row)
- WARNING: we do not recommend using `incremental` option as it might result in loss of production data. We recommend using `append` option instead which will preserve historical data.
- WARNING: We do not recommend using `incremental` option (which uses `MERGE` SQL statement). It might result in loss of production data, because historical records get updated. Instead, we recommend using the `append` replication method, which will preserve historical data.

Sample **target-config.json** file:

Expand Down
3 changes: 1 addition & 2 deletions target_bigquery/processhandler.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,8 +266,7 @@ def _do_temp_table_based_load(self, rows):
incremental_success = False
if self.incremental:
self.logger.info(f"Copy {tmp_table_name} to {self.tables[stream]} by INCREMENTAL")
#TODO: reword the warning about this replication method
self.logger.warning(f"INCREMENTAL replication method might result in data loss because we are editing the production data during the sync operation. We recommend that you use APPEND target-bigquery replication instead.")
self.logger.warning(f"INCREMENTAL replication method (MERGE SQL statement) is not recommended. It might result in loss of production data, because historical records get updated during the sync operation. Instead, we recommend using the APPEND replication method, which will preserve historical data.")
table_id = f"{self.project_id}.{self.dataset.dataset_id}.{self.tables[stream]}"
try:
self.client.get_table(table_id)
Expand Down

0 comments on commit 67ac76c

Please sign in to comment.