Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results LATERAL JOIN with nested DISTINCT ON #17382

Closed
effnish opened this issue Jun 20, 2024 · 1 comment · Fixed by #17386
Closed

Incorrect results LATERAL JOIN with nested DISTINCT ON #17382

effnish opened this issue Jun 20, 2024 · 1 comment · Fixed by #17386
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@effnish
Copy link

effnish commented Jun 20, 2024

Describe the bug

I have a table events that contains the timestamps and source for all changes that occur across several related streams that update independently. I'm trying to perform a query using the events table to get the current state from one of the related tables as of the event time.

I've attempted this by performing a lateral join of events with a nested SELECT DISTINCT ON(...) FROM related r. Unfortunately, the results from related are not distinct.

Error message/log

No errors, just incorrect results.

To Reproduce

CREATE TABLE r(ts TIMESTAMPTZ, src_id int, dev_id int);
INSERT INTO r VALUES
('2024-06-20T19:00:22Z'::TIMESTAMPTZ, 2, 2),
('2024-06-20T19:00:22Z'::TIMESTAMPTZ, 1, 3),
('2024-06-20T19:00:23Z'::TIMESTAMPTZ, 1, 2),
('2024-06-20T19:00:24Z'::TIMESTAMPTZ, 2, 1),
('2024-06-20T19:00:24Z'::TIMESTAMPTZ, 1, 2),
('2024-06-20T19:00:25Z'::TIMESTAMPTZ, 2, 1);
SELECT e.ts AS e_ts, d.*
FROM (
  SELECT '2024-06-20T19:01:00Z'::TIMESTAMPTZ ts, 1::INT AS src_id) e
JOIN LATERAL
(
  SELECT DISTINCT ON(src_id, dev_id) *
  FROM r
  WHERE r.src_id = e.src_id AND r.ts <= e.ts
  ORDER BY src_id, dev_id, ts DESC
)d on true;

Expected behavior

I expect to see results equal to:

SELECT DISTINCT ON(src_id, dev_id) *
FROM r
WHERE r.src_id = 1
  AND r.ts <= '2024-06-20T19:01:00Z'::TIMESTAMPTZ
ORDER BY src_id, dev_id, ts DESC;
           ts              | src_id | dev_id
---------------------------+--------+--------
 2024-06-20 19:00:24+00:00 |      1 |      2
 2024-06-20 19:00:22+00:00 |      1 |      3

Instead I see duplicate results for device_id = 2

           e_ts            |            ts             | src_id | dev_id
---------------------------+---------------------------+--------+--------
 2024-06-20 19:01:00+00:00 | 2024-06-20 19:00:24+00:00 |      1 |      2
 2024-06-20 19:01:00+00:00 | 2024-06-20 19:00:22+00:00 |      1 |      3
 2024-06-20 19:01:00+00:00 | 2024-06-20 19:00:23+00:00 |      1 |      2

How did you deploy RisingWave?

docker run -it --pull=always -p 4566:4566 -p 5691:5691 risingwavelabs/risingwave:latest single_node

The version of RisingWave

dev=> select version();
                                        version
---------------------------------------------------------------------------------------
 PostgreSQL 13.14.0-RisingWave-1.11.0-alpha (8735194f600febf58aa09b6b89d5fc474b39c448)

Additional context

No response

@effnish effnish added the type/bug Something isn't working label Jun 20, 2024
@github-actions github-actions bot added this to the release-1.10 milestone Jun 20, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Jun 21, 2024

Thanks for reporting this! We will take a look soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants