Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix submissions_status RPC when UNION ALL doesn't keep the order #1201

Merged
merged 2 commits into from
Nov 6, 2024

Conversation

edomora97
Copy link
Contributor

UNION ALL is not guaranteed to keep the order of the results. In fact with Postgres 13 the order is not well defined and can change between executions (not sure which versions are affected).

This patch adds an extra column to the query, tagging each statistics with its key, so that we can reconstruct the correct order of the results.

This is what was happening:
image

Old query
SELECT anon_1.count_1 AS anon_1_count_1
FROM (SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NULL AND submission_results.compilation_tries < %(compilation_tries_1)s UNION ALL SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NULL AND submission_results.compilation_tries >= %(compilation_tries_2)s UNION ALL SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome = %(compilation_outcome_1)s UNION ALL SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.compilation_outcome = %(compilation_outcome_2)s AND submission_results.evaluation_outcome IS NULL AND submission_results.evaluation_tries < %(evaluation_tries_1)s UNION ALL SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.compilation_outcome = %(compilation_outcome_2)s AND submission_results.evaluation_outcome IS NULL AND submission_results.evaluation_tries >= %(evaluation_tries_2)s UNION ALL SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.evaluation_outcome IS NOT NULL AND NOT (submission_results.score IS NOT NULL AND submission_results.score_details IS NOT NULL AND submission_results.public_score IS NOT NULL AND submission_results.public_score_details IS NOT NULL AND submission_results.ranking_score_details IS NOT NULL) UNION ALL SELECT count(submission_results.submission_id) AS count_1
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.evaluation_outcome IS NOT NULL AND submission_results.score IS NOT NULL AND submission_results.score_details IS NOT NULL AND submission_results.public_score IS NOT NULL AND submission_results.public_score_details IS NOT NULL AND submission_results.ranking_score_details IS NOT NULL UNION ALL SELECT count(submissions.id) AS count_2
FROM submissions JOIN tasks ON submissions.task_id = tasks.id
WHERE tasks.contest_id = %(contest_id_2)s) AS anon_1
 anon_1_count_1
----------------
              0
         708629
            195
              0
         614906
              0
          93285
            243

Timing: 22559,237 ms (00:22,559)

New query
SELECT anon_1.count_1 AS anon_1_count_1, anon_1.key AS anon_1_key
FROM (SELECT count(submission_results.submission_id) AS count_1, 'compiling' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NULL AND submission_results.compilation_tries < %(compilation_tries_1)s UNION ALL SELECT count(submission_results.submission_id) AS count_1, 'max_compilations' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NULL AND submission_results.compilation_tries >= %(compilation_tries_2)s UNION ALL SELECT count(submission_results.submission_id) AS count_1, 'compilation_fail' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome = %(compilation_outcome_1)s UNION ALL SELECT count(submission_results.submission_id) AS count_1, 'evaluating' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.compilation_outcome = %(compilation_outcome_2)s AND submission_results.evaluation_outcome IS NULL AND submission_results.evaluation_tries < %(evaluation_tries_1)s UNION ALL SELECT count(submission_results.submission_id) AS count_1, 'max_evaluations' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.compilation_outcome = %(compilation_outcome_2)s AND submission_results.evaluation_outcome IS NULL AND submission_results.evaluation_tries >= %(evaluation_tries_2)s UNION ALL SELECT count(submission_results.submission_id) AS count_1, 'scoring' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.evaluation_outcome IS NOT NULL AND NOT (submission_results.score IS NOT NULL AND submission_results.score_details IS NOT NULL AND submission_results.public_score IS NOT NULL AND submission_results.public_score_details IS NOT NULL AND submission_results.ranking_score_details IS NOT NULL) UNION ALL SELECT count(submission_results.submission_id) AS count_1, 'scored' AS key
FROM submission_results JOIN datasets ON datasets.id = submission_results.dataset_id JOIN tasks ON datasets.task_id = tasks.id
WHERE tasks.active_dataset_id = submission_results.dataset_id AND tasks.contest_id = %(contest_id_1)s AND submission_results.compilation_outcome IS NOT NULL AND submission_results.evaluation_outcome IS NOT NULL AND submission_results.score IS NOT NULL AND submission_results.score_details IS NOT NULL AND submission_results.public_score IS NOT NULL AND submission_results.public_score_details IS NOT NULL AND submission_results.ranking_score_details IS NOT NULL UNION ALL SELECT count(submissions.id) AS count_2, 'total' AS key
FROM submissions JOIN tasks ON submissions.task_id = tasks.id
WHERE tasks.contest_id = %(contest_id_2)s) AS anon_1
 anon_1_count_1 |    anon_1_key
----------------+------------------
              0 | scoring
         708629 | total
            195 | max_compilations
              0 | evaluating
         614906 | scored
              0 | compiling
            243 | max_evaluations
          93285 | compilation_fail

Timing: 21561,638 ms (00:21,562)

@codecov
Copy link

codecov bot commented Jan 26, 2022

Codecov Report

Merging #1201 (655cc95) into master (0401c53) will decrease coverage by 0.24%.
The diff coverage is 20.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1201      +/-   ##
==========================================
- Coverage   63.86%   63.61%   -0.25%     
==========================================
  Files         233      233              
  Lines       17151    17152       +1     
==========================================
- Hits        10954    10912      -42     
- Misses       6197     6240      +43     
Flag Coverage Δ
functionaltests 47.38% <20.00%> (-0.38%) ⬇️
unittests 43.99% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cms/server/admin/server.py 53.12% <20.00%> (-0.85%) ⬇️
cms/service/workerpool.py 66.66% <0.00%> (-5.56%) ⬇️
cms/io/rpc.py 91.33% <0.00%> (-3.67%) ⬇️
cms/server/admin/handlers/base.py 67.98% <0.00%> (-3.31%) ⬇️
cms/service/EvaluationService.py 74.12% <0.00%> (-2.99%) ⬇️
cms/db/base.py 89.10% <0.00%> (-1.99%) ⬇️
cms/service/esoperations.py 79.86% <0.00%> (-1.39%) ⬇️
cms/io/service.py 72.78% <0.00%> (-1.27%) ⬇️
cms/server/contest/phase_management.py 96.42% <0.00%> (-1.20%) ⬇️
cms/service/ResourceService.py 59.11% <0.00%> (-0.89%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0401c53...655cc95. Read the comment docs.

edomora97 and others added 2 commits November 6, 2024 12:11
UNION ALL is not guaranteed to keep the order of the results. In fact
with (at least) Postgres 13 the order is not well defined and can change
between executions.

This patch adds an extra column to the query, tagging each statistics
with its key, so that we can reconstruct the correct order of the
results.
@wil93 wil93 merged commit 98ad9a2 into cms-dev:master Nov 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants