Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse Enum support #14658

Closed
HurricanKai opened this issue Jan 18, 2024 · 3 comments
Closed

Clickhouse Enum support #14658

HurricanKai opened this issue Jan 18, 2024 · 3 comments

Comments

@HurricanKai
Copy link
Contributor

Is your feature request related to a problem? Please describe.

I'm trying to efficiently insert a risingwave String into a Clickhouse table which uses Enum8.

Describe the solution you'd like

I'd like it to be possible to sink Risingwaves String into Clickhouse Enum8s.

Describe alternatives you've considered

If an equivalent risingwave type existed, converting to that and inserting that into clickhouse would be fine also, but I'm not aware of such a type.

Additional context

Clickhouse Enum Docs

Current Error:

Caused by these errors (recent errors listed first):
  1: gRPC request to meta service failed: Internal error
  2: Sink error: ClickHouse error: Column type can not match name is "region", risingwave is Varchar and clickhouse is "Enum8(...)"
@github-actions github-actions bot added this to the release-1.7 milestone Jan 18, 2024
@HurricanKai
Copy link
Contributor Author

After digging a bit into this, it seems like there's no easy way to support inserting Strings into Enums.
From my understanding the risingwave clickhouse connector uses the RowBinary format, which does not support this.

Instead allowing Int16s to be inserted into Enum8/16 is pretty easy, PR following. For my use cases this is fine, although a little annoying.
Alternatives to insert strings include

  • Using a different format where necessary, ie CSV, TSV, plain Values, where clickhouse does the conversion
  • Parsing the type (in code ck_column.r#type) string to understand which string corresponds to which value. This is fairly straightforward, but relies on parsing clickhouse DDL...

@HurricanKai
Copy link
Contributor Author

HurricanKai commented Jan 18, 2024

See PR for further info - I've only managed to get Enum16 working. My dataset isn't that huge so this is fine for me. Supporting Enum8 would be more work and I'm not sure how critical.
IMO this issue could be closed if the above PR is merged, if any of the other features are desired (ie Enum8 support or String insertion), that could be tracked in another issue.

Example Queries I've used, which demonstrate fairly well that string insertion is trivial to work around, although a maintenance burden:

Clickhouse:

CREATE TABLE matches_summary(
    matchid UInt64,
    region Enum16(
        'BR1' = 1,
        'EUN1' = 2,
        'EUW1' = 3,
        ...
    ),
    ts DateTime64
) ENGINE = ReplacingMergeTree
PRIMARY KEY (region, matchid)
ORDER BY (region, matchid, ts)
TTL toDateTime(ts + INTERVAL 90 DAY);
CREATE SINK ch_matches_summary
AS (
    WITH id_mapping AS (
        SELECT str, num FROM (
            SELECT 'BR1' AS str, 1 AS num
            UNION ALL 
            SELECT 'EUN1', 2 UNION ALL
            SELECT 'EUW1', 3 UNION ALL
            ...
        )
    )
    SELECT matchId, id_mapping.num::smallint AS region, ts
    FROM (...) AS temp
    JOIN id_mapping ON temp.region = id_mapping.str
)
WITH (
    connector = 'clickhouse',
    type = 'append-only',
    clickhouse.url = 'http://clickhouse:8123',
    clickhouse.user = 'default',
    clickhouse.password = '',
    clickhouse.database = 'default',
    clickhouse.table = 'matches_summary',
    primary_key = 'matchid'
);

@xxhZs
Copy link
Contributor

xxhZs commented Mar 5, 2024

Currently enum16 has been implemented, enum8 can't correspond to int8 because rw doesn't have a int8, so close this issue

@xxhZs xxhZs closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants