-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): support create iceberg source #14971
Conversation
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9425213 | Triggered | Generic Password | 11fdfcd | ci/scripts/regress-test.sh | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM
src/connector/src/source/base.rs
Outdated
pub struct SourceEnumeratorInfo { | ||
pub source_id: u32, | ||
pub source: Option<Source>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Iceberg, we need this Source
to validate that the schema users provided is validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do that in the optimizer stage as other connectors do? During the create source
process, we fetch all related info and do checks there
src/sqlparser/src/ast/statement.rs
Outdated
@@ -249,6 +249,18 @@ impl Parser { | |||
} else { | |||
ConnectorSchema::native().into() | |||
}) | |||
} else if connector.contains("iceberg") { | |||
let expected = ConnectorSchema::native(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the format
and encode
should be here, but it should not be the native. native
means the format and encode is the same with Rsingwave's internal representation.
the FORMAT
means how to get the operation(Insert/Delete) from the file, the ENCODE
is means how to get the rows containment from the bytes, such as parquet/csv/json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually the issue is that "batch-only source" does not need to resolve the operation from the file... so the streaming connector's “FORMAT” is meaningless for it... And I am not sure about the encode, does it stored in iceberg's catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think we need encode or format for iceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for iceberg. We don't need any format or encoding, because they are self-explanatory and users don't need to specify any format and encoding. We could introduce another Format works like Native
to meet this requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many places expect source format and encoding, so I don't want to make them accept an Option<SourceSchema>
. Introduce a trivial format and encoding for iceberg LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually the issue is that "batch-only source" does not need to resolve the operation from the file... so the streaming connector's “FORMAT” is meaningless for it... And I am not sure about the encode, does it stored in iceberg's catalog?
here we just want to load from the iceberg, all chunks are written as insert
I think. format plain
should be ok, which means append only. encode
can be a random placeholder, eg. iceberg
, to mark it as a spec encode and not for common cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it can be considered as a kind of format plain
... especially considering there could be deletion files in Iceberge. I am thinking if we need ConnectorSchema
for the iceberg connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/connector/src/source/base.rs
Outdated
pub struct SourceEnumeratorInfo { | ||
pub source_id: u32, | ||
pub source: Option<Source>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do that in the optimizer stage as other connectors do? During the create source
process, we fetch all related info and do checks there
fn id(&self) -> SplitId { | ||
unimplemented!() | ||
} | ||
|
||
fn restore_from_json(_value: JsonbVal) -> anyhow::Result<Self> { | ||
unimplemented!() | ||
} | ||
|
||
fn encode_to_json(&self) -> JsonbVal { | ||
unimplemented!() | ||
} | ||
|
||
fn update_with_offset(&mut self, _start_offset: String) -> anyhow::Result<()> { | ||
unimplemented!() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these going to be impl? or left them here on purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of them would be implemented for batch-read in a later PR. But most of them related to streaming are left unimplemented on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This comment was marked as off-topic.
This comment was marked as off-topic.
We have iceberg sink and I just think iceberg source should have similar field names to the sink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch select from iceberg source is not implemented in this PR, right? If so, how about change the PR title to feat(frontend): support create iceberg source
? 😄
The PR itself LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work.
Merge queue setting changed
Merge queue setting changed
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
iceberg
source.iceberg
source could not be used in streaming queries.iceberg
source enumerator only checks source schema against iceberg schema.iceberg
source has its own row format and doesn't need to be specified by users explicitly, we could omit it.Iceberg source Example
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.