-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limiting total rows copied in COPY TABLE FROM
statement
#3714
Comments
I'd like to take it, Could you please assign it to me? |
Sure. Are you going to add a new runtime option to limit the rows? |
Sure, should we add a configuration option in the OptionMap in CopyTableArgument to limit it? |
Maybe we can set a default limit, for example 1000 rows, unless user explicitly specified the limit in |
What problem does the new feature solve?
As pointed out by this article, a 42 KiB parquet file may contain hundreds of trillions of values, just like what zip bomb does. If GreptimeDB does not limit the total rows copied in single
COPY TABLE FROM
statement, these parquet bombs may immediately overload the backend storage.What does the feature do?
Limit total row imported by single
COPY TABLE FROM
statement.The execution of copy statements will build a record batch stream on file and insert the batches yielded to mito engine.
greptimedb/src/operator/src/statement/copy_table_from.rs
Lines 402 to 437 in 3acd5bf
We can add a config option to limit the total rows read from the record batch stream and terminate the execution once it exceeds threshold.
The text was updated successfully, but these errors were encountered: