-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic when using Iceberg Sink connector from materialized view #17296
panic when using Iceberg Sink connector from materialized view #17296
Comments
Thanks for the reporting @anubhavgupta2404. Could you please use our latest version v1.9.1 to try again? |
Hey @chenzl25 We have upgraded the version to PostgreSQL 13.14.0-RisingWave-1.9.1 (4fa6c8b) now. But upon running the same script for creating iceberg sink connector using materialized view, I am now getting following error:
The detailed error message is as follow:
|
@anubhavgupta2404 Could you please provide the log (frontend and compute nodes) of RisingWave when this error occurred? Because the internal stack trace hasn't been reported to client. |
@chenzl25
Didn't find anything relevant in compute |
@chenzl25 sure, send me the image url, i'll deploy it and test |
@chenzl25
|
@wenym1 Did you have any idea? |
@chenzl25 if this is fixed then which release should I use to test? |
Sorry, the issue is closed by the related PR automatically. The fix would be included in the next version i.e. 1.11.0 Every day, we would release a nightly docker image. You can use it to test if this fix is urgent to use. The current latest nightly image is |
sure @chenzl25, I'll try |
@chenzl25 i changed the image in my yaml and re-applied
|
@chenzl25 if I delete and do a fresh deployment, then the above error does not occur |
Thanks for reporting @swapkh91. It is because the nightly image contains some developing PRs with breaking change so it is expected. What about the iceberg sink with the hive catalog? Does it have been resolved in your environment? |
@chenzl25 getting the same error
|
@chenzl25
this query returns the results on Starrocks
but this gives the above error on risingwave, i think all fields are fine
|
@swapkh91 The related PR got merged yesterday, but this time is so close to the image-building time, I am not 100% sure whether it is included into the image. We can use image
|
@chenzl25 not yet solved when I execute on MV
Now when I execute create sink command, these are the logs
|
I google this error and found that RisingWave jar dependency lacks
|
Describe the bug
The sink connector to Iceberg table in hive catalog is throwing error. It is suppose to use the materialized view to sink data into Iceberg table, but risingwave is not able to support the mechanism.
Error message/log
To Reproduce
--------------Create a table that fetch the data fromo Google pubsub as a source
create table public.segment_impressions (
"anonymousId" varchar,
"context" jsonb,
"event" varchar,
"integrations" jsonb,
"messageId" varchar,
"originalTimestamp" varchar,
"properties" jsonb,
"receivedAt" varchar,
"sentAt" varchar,
"timestamp" varchar,
"type" varchar,
"writeKey" varchar
)
WITH (
connector = 'google_pubsub',
pubsub.subscription = 'risingwave-test',
pubsub.credentials = ''
) FORMAT PLAIN ENCODE JSON;
----------Create a materialized view that transforms the data flowing from the source
create materialized view public.segment_impression_event_mv
(appointment_id, auction_id, tvcDealerId, timestamp_ist)
as
select
replace((json_data->'appointment_id')::varchar,'"','') as appointment_id,
replace((json_data->'auction_id')::varchar,'"','') as auction_id,
tvcDealerId, timestamp_ist
from(
select jsonb_array_elements((properties->'car_info' #>> '{}')::jsonb) as json_data, replace((properties->'tvcDealerId')::varchar,'"','') as tvcDealerId ,
to_timestamp(substring(replace(timestamp,'T',' '),1,19),'YYYY-MM-DD HH24:MI:SS')::timestamp without time zone + INTERVAL '330 MINUTES' as timestamp_ist
from public.segment_impressions
where to_timestamp(substring(replace(timestamp,'T',' '),1,19),'YYYY-MM-DD HH24:MI:SS')::timestamp without time zone + INTERVAL '330 MINUTES' >= current_timestamp - interval '2 DAYS'
) as tbl;
---------------Sink the transformed data in materialized view to and Iceberg table in Hive catalog
CREATE SINK public.segment_impression_data FROM public.segment_impression_event_mv
WITH (
connector = 'iceberg',
type = 'append-only',
force_append_only = 'true',
warehouse.path = 's3a://hive-iceberg/segment_db/',
s3.endpoint = 'http://X.X.X.X',
s3.access.key = '',
s3.secret.key = '************',
s3.region = 'asia-south1',
-- catalog.name = 'iceberg_hive_qa',
catalog.type = 'hive',
catalog.uri = 'thrift://X.X.X.X:9083',
database.name = 'segmentdb',
table.name = 'segment_impression_data_rw'
-- primary_key='seq_id'
);
---------------This is the part where the bug is while loading the MV data into Iceberg sink connector.
Expected behavior
I expected to see the data flowing from materialized view to and Iceberg table in hive catalog, instead it is not able to configure or read metadata of the iceberg table to setup the connection.
The I/O operation is failing and even if the command is syntactically correct the sink connector operation is failing with no proper error details.
How did you deploy RisingWave?
Deployed on GKE through operator. My risingwave-operator-system yaml file is:
apiVersion: v1
kind: Service
metadata:
name: risingwave-etcd
labels:
app: risingwave-etcd
spec:
ports:
name: client
name: peer
selector:
app: risingwave-etcd
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: risingwave-etcd
name: risingwave-etcd
spec:
replicas: 1
selector:
matchLabels:
app: risingwave-etcd
serviceName: risingwave-etcd
volumeClaimTemplates:
name: etcd-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete
whenScaled: Retain
template:
metadata:
labels:
app: risingwave-etcd
spec:
nodeSelector:
cloud.google.com/gke-nodepool: risingwave-pool
containers:
- name: etcd
image: quay.io/coreos/etcd:latest
imagePullPolicy: IfNotPresent
command:
- /usr/local/bin/etcd
args:
- "--listen-client-urls"
- "http://0.0.0.0:2388/"
- "--advertise-client-urls"
- "http://risingwave-etcd-0:2388/"
- "--listen-peer-urls"
- "http://0.0.0.0:2389/"
- "--initial-advertise-peer-urls"
- "http://risingwave-etcd-0:2389/"
- "--listen-metrics-urls"
- "http://0.0.0.0:2379/"
- "--name"
- "risingwave-etcd"
- "--max-txn-ops"
- "999999"
- "--max-request-bytes"
- "104857600"
- "--auto-compaction-mode"
- periodic
- "--auto-compaction-retention"
- 1m
- "--snapshot-count"
- "10000"
- --quota-backend-bytes
- "8589934592"
- --data-dir
- /var/lib/etcd
env:
- name: ALLOW_NONE_AUTHENTICATION
value: "1"
ports:
- containerPort: 2389
name: peer
protocol: TCP
- containerPort: 2388
name: client
protocol: TCP
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
apiVersion: v1
kind: Secret
metadata:
name: gcs-credentials
stringData:
ServiceAccountCredentials: ""
apiVersion: risingwave.risingwavelabs.com/v1alpha1
kind: RisingWave
metadata:
name: risingwave-etcd-gcs
spec:
metaStore:
etcd:
endpoint: risingwave-etcd:2388
stateStore:
gcs:
bucket: risingwave-test
root: risingwave
credentials:
secretName: gcs-credentials
serviceAccountCredentialsKeyRef: ServiceAccountCredentials
image: risingwavelabs/risingwave:v1.7.3
components:
meta:
nodeGroups:
- replicas: 1
name: ""
template:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: risingwave-pool
volumes:
- name: heap
emptyDir:
sizeLimit: 1Gi
volumeMounts:
- mountPath: /heap
name: heap
env:
- name: MALLOC_CONF
value: prof:true,lg_prof_interval=-1,lg_prof_sample=20,prof_prefix:/heap/
- name: RW_HEAP_PROFILING_DIR
value: /heap
resources:
limits:
cpu: 1
memory: 2Gi
requests:
cpu: 1
memory: 2Gi
frontend:
nodeGroups:
- replicas: 1
name: ""
template:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: risingwave-pool
resources:
limits:
cpu: 1
memory: 2Gi
requests:
cpu: 1
memory: 2Gi
compute:
nodeGroups:
- replicas: 1
name: ""
template:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: risingwave-pool
volumes:
- name: heap
emptyDir:
sizeLimit: 1Gi
volumeMounts:
- mountPath: /heap
name: heap
env:
- name: MALLOC_CONF
value: prof:true,lg_prof_interval=-1,lg_prof_sample=20,prof_prefix:/heap/
- name: RW_HEAP_PROFILING_DIR
value: /heap
resources:
limits:
cpu: 4
memory: 16Gi # Memory limit will be set to
RW_TOTAL_MEMORY_BYTES
requests:
cpu: 4
memory: 16Gi
compactor:
nodeGroups:
- replicas: 1
name: ""
template:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: risingwave-pool
volumes:
- name: heap
emptyDir:
sizeLimit: 1Gi
volumeMounts:
- mountPath: /heap
name: heap
env:
- name: MALLOC_CONF
value: prof:true,lg_prof_interval=-1,lg_prof_sample=20,prof_prefix:/heap/
- name: RW_HEAP_PROFILING_DIR
value: /heap
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
The version of RisingWave
PostgreSQL 9.5.0-RisingWave-1.7.3 (cfefe78)
Additional context
No response
The text was updated successfully, but these errors were encountered: