-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploads CSV to Clickhouse #230
Comments
Have you considered uploading CSV with specialized tools such as DBeaver, which will provide much more flexibility? Then, just re-sync the schema in Metabase. Additionally, it is preferable to have read-only user profiles for the BI tools. Uploading a CSV will require Metabase to be connected via a non-read-only profile. |
Thanks for your answer! Non-read-only is also not a problem, because we can give Metabase access only to our Sandbox base on Clickhouse's host. |
This is a bit non-trivial due to these caveats:
To be fair, I am very hesitant to implement this, given that other well-known tools do that with specialized interfaces which will yield better results. |
Thanks for your feedback. My suggestions are:
Something like that:
Of course we as analysts can do it in DataGrip, PyCharm and so on. But the key feature of Metabase is self-service and we want our business users to be able to upload their on data without calling developers and analysts, because they are always a bottle-neck. |
According to the docs, only 2 connectors support |
I'd like to add a bit more details here. We have three different deployment types that we want to support:
The DDLs for the tables created and specific required settings for the data insertion vary significantly based on the deployment types. Here’s what needs to be done by the driver if we want to properly create a table and insert something there, supporting all three scenarios: Prerequisites
SELECT count(*) AS nodes_count FROM system.clusters c JOIN system.macros m ON c.cluster = m.substitution
SELECT value AS is_cloud FROM system.settings WHERE name='cloud_mode'
Connection/DDL/insert variantsOn-premise single nodeDDL: CREATE TABLE csv_upload (...) ENGINE MergeTree ORDER BY (...) Insert: standard JDBC methods from MB will probably work. On-premise clusterThis is the most complicated scenario.
SELECT getMacro('cluster'), getMacro('replica'), getMacro('shard')
CREATE TABLE csv_upload ON CLUSTER '{cluster}'
(...)
ENGINE ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{database}/{table}/{shard}',
'{replica}'
)
ORDER BY (...)
CH CloudThis is mostly similar to the on-premise single node (CH Cloud takes care of cluster macro, etc, in the DDL).
CREATE TABLE csv_upload (...) ENGINE MergeTree ORDER BY (...)
I don't have time to implement all these scenarios, and I don't feel comfortable implementing support for only one deployment type. However, if someone from the community wishes to take on this task, I can help with the setup, statements, etc. - you could always ping me here or directly in the community Slack. |
1.5.0 supports CSV uploads with ClickHouse Cloud (props to @calherries). |
(Edited by @slvrtrn to keep track of the progress of the feature implementation).
CSV uploads feature needs to support the following ClickHouse deployment types:
The text was updated successfully, but these errors were encountered: