Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: enhancement of table function file_scan #18000

Open
3 tasks
wcy-fdu opened this issue Aug 12, 2024 · 0 comments
Open
3 tasks

Tracking: enhancement of table function file_scan #18000

wcy-fdu opened this issue Aug 12, 2024 · 0 comments
Assignees
Milestone

Comments

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Aug 12, 2024

To batch read files from file system(s3, gcs, etc.), RisingWave supports both select ... from source and table function file_scan, the difference between them is that select ... from source requires the user to define the schema in the create source statement, while file_scan is simpler for batch queries as it directly uses a table function without the need to define a schema.
Take batch query parquet file as an example

  • select ... from source
CREATE source s3_parquet(
     id int,
     age int,
)
WITH (
    connector = 's3_v2',
    match_pattern = '*.parquet',
    s3.region_name = 'xxx',
    s3.bucket_name = 'xxx',
    s3.credentials.access = 'xxx',
    s3.credentials.secret = 'xxx',
    s3.endpoint_url = 'xxx',
) FORMAT PLAIN ENCODE PARQUET;
  • select * from s3_parquet;
SELECT 
  id, 
  age
 FROM file_scan(
   'parquet', 
   's3',
   'region', 
   'endpoint', 
   'xxx',
 )

Users can choose different query methods according to their needs.

Currently, file_scan only supports Parquet files on S3 and is not fully developed; there are areas that can be improved.

  • Support more encode type json, csv encode), and automatic schema mapping.
  • Support different object store engine(s3, gcs, azblob).
  • more user-friendly syntax.
@github-actions github-actions bot added this to the release-2.0 milestone Aug 12, 2024
@wcy-fdu wcy-fdu self-assigned this Aug 12, 2024
@wcy-fdu wcy-fdu modified the milestones: release-2.0, release-2.1 Aug 19, 2024
@wcy-fdu wcy-fdu modified the milestones: release-2.1, release-2.2 Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant