Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connector): support local fs source #14312

Merged
merged 5 commits into from
Jan 5, 2024
Merged

Conversation

KeXiangWang
Copy link
Contributor

@KeXiangWang KeXiangWang commented Jan 3, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

As title. let keep it in stealth and only used for test.
Posix fs source should only be used for testing.
For a single-CN cluster, the behavior is well-defined. It will read from the local file system.
For a multi-CN cluster, each CN will read from its own local file system under the given directory.

There are two parameters needed:

root: String, The root directly of the files to search. The files will be searched recursively.
match_pattern: Option<String>, The regex pattern to match files under root directory.

An example:

CREATE TABLE diamonds (
    carat FLOAT,
    cut TEXT,
    color TEXT,
    depth FLOAT,
) WITH (
  connector = 'posix_fs',
  match_pattern = 'data*.csv',
  posix_fs.root = '~/downloads',
) FORMAT PLAIN ENCODE CSV ( without_header = 'false', delimiter = ',' );

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@KeXiangWang KeXiangWang marked this pull request as draft January 3, 2024 01:49
@wcy-fdu wcy-fdu self-requested a review January 3, 2024 06:34
@KeXiangWang KeXiangWang force-pushed the wkx/local-file-connector branch from 5e847e1 to e02d209 Compare January 3, 2024 22:22
@KeXiangWang
Copy link
Contributor Author

The main-cron test of local_fs has passed. Check details here.

@KeXiangWang KeXiangWang requested a review from tabVersion January 3, 2024 23:43
@KeXiangWang KeXiangWang marked this pull request as ready for review January 3, 2024 23:43
@KeXiangWang KeXiangWang requested a review from a team as a code owner January 3, 2024 23:43
Comment on lines +39 to +44
// TODO(Kexiang): Currently, FsListnenr in opendal does not support a prefix. (Seems a bug in opendal)
// So we assign prefix to empty string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we can deny the prefix in props?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fs source is only used in testing, it's ok to ignore the miss of prefix list.
BTW, I'm not aware that OpenDAL Fs engine does not support prefix list. cc @Xuanwo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix is not in props now. In other implemenataion, it's extracted from the match_pattern(eg, the prefix of abc*.csv is abc) and used for filtering out files with a different prefix under the directory/bucket. Since Opendal does not support this listing with prefix correctly, we simply set the prefix to empty. If the problem in opendal is solved, we can then generate the prefix here and return it, it benefits to performance.

Copy link
Contributor

@Xuanwo Xuanwo Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm not aware that OpenDAL Fs engine does not support prefix list. cc @Xuanwo

opendal 0.44 should fix it.

As I mentioned before, the prefix support for fs is simulated: we will list the parent dir and return the matched prefix. Since fs doesn't have prefix list support natively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join the conversation on adding glob support to opendal – your input would be valuable: apache/opendal#3500

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pr LGTM as long as it satisfies your testing requirements. also please some doc for all accepted props.

Copy link
Contributor

@wcy-fdu wcy-fdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as this change is only used for test.

Comment on lines +39 to +44
// TODO(Kexiang): Currently, FsListnenr in opendal does not support a prefix. (Seems a bug in opendal)
// So we assign prefix to empty string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fs source is only used in testing, it's ok to ignore the miss of prefix list.
BTW, I'm not aware that OpenDAL Fs engine does not support prefix list. cc @Xuanwo

Cargo.lock Outdated Show resolved Hide resolved
@KeXiangWang KeXiangWang force-pushed the wkx/local-file-connector branch from 0bafe88 to 4b48e2a Compare January 5, 2024 02:53
@KeXiangWang KeXiangWang force-pushed the wkx/local-file-connector branch from 4b48e2a to e649fae Compare January 5, 2024 03:16
@KeXiangWang KeXiangWang enabled auto-merge January 5, 2024 04:53
@KeXiangWang KeXiangWang added this pull request to the merge queue Jan 5, 2024
Merged via the queue into main with commit 5a3d265 Jan 5, 2024
27 of 28 checks passed
@KeXiangWang KeXiangWang deleted the wkx/local-file-connector branch January 5, 2024 05:49
Li0k pushed a commit that referenced this pull request Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants