Tracking: enhancement of table function `file_scan` #18000

wcy-fdu · 2024-08-12T07:57:12Z

To batch read files from file system(s3, gcs, etc.), RisingWave supports both select ... from source and table function file_scan, the difference between them is that select ... from source requires the user to define the schema in the create source statement, while file_scan is simpler for batch queries as it directly uses a table function without the need to define a schema.
Take batch query parquet file as an example

select ... from source

CREATE source s3_parquet(
     id int,
     age int,
)
WITH (
    connector = 's3_v2',
    match_pattern = '*.parquet',
    s3.region_name = 'xxx',
    s3.bucket_name = 'xxx',
    s3.credentials.access = 'xxx',
    s3.credentials.secret = 'xxx',
    s3.endpoint_url = 'xxx',
) FORMAT PLAIN ENCODE PARQUET;

select * from s3_parquet;

SELECT 
  id, 
  age
 FROM file_scan(
   'parquet', 
   's3',
   'region', 
   'endpoint', 
   'xxx',
 )

Users can choose different query methods according to their needs.

Currently, file_scan only supports Parquet files on S3 and is not fully developed; there are areas that can be improved.

Support more encode type json, csv encode), and automatic schema mapping.
Support different object store engine(s3, gcs, azblob).
more user-friendly syntax.

The text was updated successfully, but these errors were encountered:

wcy-fdu added the type/feature label Aug 12, 2024

github-actions bot added this to the release-2.0 milestone Aug 12, 2024

wcy-fdu self-assigned this Aug 12, 2024

wcy-fdu modified the milestones: release-2.0, release-2.1 Aug 19, 2024

wcy-fdu modified the milestones: release-2.1, release-2.2 Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: enhancement of table function `file_scan` #18000

Tracking: enhancement of table function `file_scan` #18000

wcy-fdu commented Aug 12, 2024

Tracking: enhancement of table function file_scan #18000

Tracking: enhancement of table function file_scan #18000

Comments

wcy-fdu commented Aug 12, 2024

Tracking: enhancement of table function `file_scan` #18000

Tracking: enhancement of table function `file_scan` #18000