You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.
I am working on a source (dynamodb) whose upstream table names can contain special characters, and in our case they contain dashes which are parsed in a special and significant way by this target.
In the effort to provide isolation between upstream and downstream table name, I research into the spec and found that (according to my interpretation of the spec here) table_name is intended to describe the upstream source and tap_stream_id is intended to drive downstream behavior.
When sending events now through that tap, it appears that there is inconsistency on when this target uses table_name and when it uses tap_stream_id. (Again, according to my understanding of the spec here, table_name should be used by the tap and tap_stream_id should govern naming in the target.)
Log below comes from a single table sync operation. Note that first it uses the correct table name, and second it uses the table name "TABLE", which is likely coming from parsing the table_name instead of tap_stream_id.
I'm planning to submit a PR but first wanted to post this to create awareness and promote discussion.
Thanks!
Here's the full log...
Note that the upstream table_name in this examples is dev_mes-employeeAssessment-table and tap_stream_id of employeeAssessment. The target checks first if employeeAssessment exists and then (mistakenly) if TABLE exists.
2020-09-04 16:06:16,334 - INFO - Beginning running command: tap-dynamodb --config /mnt/c/Files/Source/slalom-data-platform-core/data/taps/.secrets/tmp/tap-me-slalom-config.json --catalog ./.output/taps/me-slalom-catalog/me-slalom-employeeAssessment-catalog.json --state /tmp/tmpa9t8bkaj/me-slalom-employeeAssessment-state.json | target-snowflake --config /mnt/c/Files/Source/slalom-data-platform-core/data/taps/.secrets/tmp/target-snowflake-config-employeeAssessment.json > /tmp/tmpa9t8bkaj/me-slalom-employeeAssessment-state-new.json...
INFO Found credentials in shared credentials file: /mnt/c/Files/Source/slalom-data-platform-core/infra/dev/.secrets/aws-credentials
INFO Attempting to assume_role on RoleArn: arn:aws:iam::489003720472:role/TEST-AJ-DynamoDB-SingerExtracts-Role
INFO Starting sync.
INFO employeeAssessment: Starting sync
INFO Syncing full table for stream: dev_mes-employeeAssessment-table
INFO Scanning table dev_mes-employeeAssessment-table with params:
INFO TableName = dev_mes-employeeAssessment-table
INFO Limit = 1000
INFO employeeAssessment: Completed sync (17 rows)
INFO
+Sync Summary--------+--------------------+---------------+---------------------+
| table name | replication method | total records | write speed |
+--------------------+--------------------+---------------+---------------------+
| employeeAssessment | FULL_TABLE | 17 records | 19.3 records/second |
+--------------------+--------------------+---------------+---------------------+
INFO Done syncing.
time=2020-09-04 16:06:18 name=target_snowflake level=INFO message=Getting catalog objects from table cache...
time=2020-09-04 16:06:20 name=target_snowflake level=INFO message=Table 'RAW_MES."EMPLOYEEASSESSMENT"' does not exist. Creating...
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Table 'RAW_MES."TABLE"' exists
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Uploading 17 rows to external snowflake stage on S3
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Target S3 bucket: dataplatformtest01-data-44635, local file: /tmp/records_2ph8vxjw.csv.gz, S3
key: data/raw/me-slalom/employeeAssessment/v1/pipelinewise_dev_mes-employeeAssessment-table_20200904-160623-191772.csv.gz
time=2020-09-04 16:06:24 name=target_snowflake level=INFO message=Loading 17 rows into 'RAW_MES."TABLE"'
time=2020-09-04 16:06:25 name=target_snowflake level=INFO message=Loading into RAW_MES."TABLE": {"inserts": 0, "updates": 17, "size_bytes": 119}
time=2020-09-04 16:06:25 name=target_snowflake level=INFO message=Emitting state {"bookmarks": {"employeeAssessment": {"last_replication_method": "FULL_TABLE"}, "dev_mes-employeeAssessment-table": {"version": 1599260777218, "initial_full_table_complete": true, "success_timestamp": "2020-09-04T23:06:18.099907Z"}}, "currently_syncing": "dev_mes-employeeAssessment-table"}
The text was updated successfully, but these errors were encountered:
On further research, it may be possible that the tap is still sending schema messages that identify the incorrect tap_stream_id. I will close this (at least temporarily) while I confirm the schema messages are in fact being sent as expected.
UPDATE: It is indeed the upstream tap sending incorrect schema messages. No action needed here.
Background:
table_name
is intended to describe the upstream source andtap_stream_id
is intended to drive downstream behavior.I have created singer-io/tap-dynamodb#25 to resolve this on the tap side.
Problem:
When sending events now through that tap, it appears that there is inconsistency on when this target uses
table_name
and when it usestap_stream_id
. (Again, according to my understanding of the spec here,table_name
should be used by the tap andtap_stream_id
should govern naming in the target.)Log below comes from a single table sync operation. Note that first it uses the correct table name, and second it uses the table name "TABLE", which is likely coming from parsing the
table_name
instead oftap_stream_id
.I'm planning to submit a PR but first wanted to post this to create awareness and promote discussion.
Thanks!
Here's the full log...
Note that the upstream
table_name
in this examples isdev_mes-employeeAssessment-table
andtap_stream_id
ofemployeeAssessment
. The target checks first ifemployeeAssessment
exists and then (mistakenly) ifTABLE
exists.The text was updated successfully, but these errors were encountered: