Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] topk of row_number #7905

Open
lgbo-ustc opened this issue Nov 12, 2024 · 1 comment · May be fixed by #7976
Open

[CH] topk of row_number #7905

lgbo-ustc opened this issue Nov 12, 2024 · 1 comment · May be fixed by #7976
Labels
enhancement New feature or request performance

Comments

@lgbo-ustc
Copy link
Contributor

lgbo-ustc commented Nov 12, 2024

Description

Finding the top k rows of row_number is a common case. We compare following two ways

use aggregation

-- CH
SELECT
    x,
    y[1]
FROM
(
    SELECT
        x,
        groupArraySorted(1)(y) AS y
    FROM
    (
        SELECT
            rand() % 100000 AS x,
            rand() AS y
        FROM numbers(10000000)
    )
    GROUP BY x
)
FORMAT `Null`

Query id: c6630996-d937-4d62-b3f7-386d15cd3b3f

Ok.

0 rows in set. Elapsed: 0.370 sec. Processed 10.00 million rows, 80.00 MB (27.04 million rows/s., 216.28 MB/s.)
Peak memory usage: 13.02 MiB.

use window function

SELECT *
FROM
(
    SELECT
        x,
        y,
        row_number() OVER (PARTITION BY x ORDER BY y ASC) AS n
    FROM
    (
        SELECT
            rand() % 100000 AS x,
            rand() AS y
        FROM numbers(10000000)
    )
)
WHERE n <= 1
FORMAT `Null`

Query id: fd2ff438-2ab9-4644-ada4-3de48dec0eb4

Ok.

0 rows in set. Elapsed: 3.872 sec. Processed 10.00 million rows, 80.00 MB (2.58 million rows/s., 20.66 MB/s.)
Peak memory usage: 79.58 MiB.

The aggregation implement is more efficient .

@lgbo-ustc lgbo-ustc added the enhancement New feature or request label Nov 12, 2024
@lgbo-ustc lgbo-ustc changed the title [CH] Optimization for topk of row_number [CH] topk of row_number Nov 12, 2024
@lgbo-ustc
Copy link
Contributor Author

There is a transformer WindowGroupLimitExecTransformer, we could take advantage of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant