-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source): introduce gcs source and new s3 source via OpenDAL #13414
Conversation
…nto wcy/gcs_source
|
GitGuardian id | Secret | Commit | Filename | |
---|---|---|---|---|
7648795 | Generic CLI Secret | 02f63ee | integration_tests/iceberg-cdc/docker-compose.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
Testing gcs source in ci is not ready yet, but I think maybe we can add it in next pr because we already test s3 source in ci. |
|
||
let t = match om.last_modified() { | ||
Some(t) => t, | ||
None => DateTime::<Utc>::from_timestamp(0, 0).unwrap_or_default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we use January 1, 1970 0:00:00
for the mtime. Is this expected?
src/connector/src/source/filesystem/opendal_source/s3_source.rs
Outdated
Show resolved
Hide resolved
batch.push(msg); | ||
if batch.len() >= max_chunk_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized yesterday, RW can be vulnerable in this case.
Do you mean that the msg
returned in ReaderStream
can be arbitarily large?
#[derive(Clone, Debug)] | ||
pub struct FsPageItem { | ||
pub name: String, | ||
pub size: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use i64
instead of u64
for size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @tabVersion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because s3 objects from aws sdk is of type Option<i64>
since we are not reusing FsSplit
it is ok to change to u64 here
…singwave into wcy/gcs_source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your effort
batch.push(msg); | ||
if batch.len() >= max_chunk_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized yesterday, RW can be vulnerable in this case.
Do you mean that the msg returned in ReaderStream can be arbitarily large?
parser holds a buffer to handle truncated files. But if there are no delimiter found, the buffer can be really large. Let's fix this in another pr
// The new s3_v2 will use opendal. | ||
pub const OPENDAL_S3_CONNECTOR: &str = "s3_v2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confusing to me (without the background). IIUC, s3 v2 refers to the new algorithm, while opendal refers to an orthogonal aspect.
Could you tell me the reasons to use opendal for s3 v2, or points me to related discussions?
BTW, I think using S3_V2
as the variable name might be clearer than OPENDAL_S3
.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Background:
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.