[batch] make batches query go brrrrrrr (for realsies) #14649

ehigham · 2024-08-01T21:34:01Z

#14629 improved the speed of listing a user's batches but introduced a large
regression for listing all batches readble by that user. This change fixes that
regression by making use index hints and STRAIGHT_JOINs.

The index hint tells MySQL to never consider the index batches_deleted as it
has very low cardinality. In some forms of this query, the planner tries to use
it to its peril.

A problem in query 0 with #14629 (see below) was that fewer filters on batches
made the optimiser consider joins in a suboptimal order - it did a table scan
on job_groups first then sorted the results by to batches.id DESC instead
of doing an index scan on batches in reverse.

Using STRAIGHT_JOINs instead of INNER JOIN mades the optimiser start from
batches and read its index in reverse before considering other tables in
subsequent joins. From the documentation:

STRAIGHT_JOIN is similar to JOIN, except that the left table is always read
before the right table. This can be used for those (few) cases for which the
join optimizer processes the tables in a suboptimal order.

This is advantageous for a couple of reasons:

We want to list newer batches first
For this query, the batches table has more applicables indexes
We want the variable to order by to be in the primary key of the first
table so we can read the index in reverse

Before and after timings, collected by running the query 5 times, then using
profiles gathered by MySQL.

+-------+---------------------------------------------------*
| query |  description                                      |                                                                                                                                                                                                                                                         
+-------+---------------------------------------------------+
|     0 | All batches accessible to user `ci`               |
|     1 | All batches accessible to user `ci` owned by `ci` |
+-------+---------------------------------------------------*

+-------+--------+--------------------------------------------------------+------------+------------+
| query | branch | timings                                                |    mean    |    stdev   |                                                                                                                                                                                                                                             
+-------+--------+--------------------------------------------------------+------------+------------+
|     0 |  main  | 0.05894400,0.05207850,0.07067875,0.06281800,0.060250   | 0.06095385 | 0.00602255 |
|     1 |  main  | 14.1106150,12.2619323,13.8442850,12.0749633,14.0297822 | 13.2643156 | 0.90087263 |
+-------+--------+--------------------------------------------------------+------------+------------+
|     0 |   PR   | 0.04717375,0.04974350,0.04312150,0.04070850,0.04193650 | 0.04453675 | 0.00339069 |
|     1 |   PR   | 0.04423925,0.03967550,0.03935425,0.04056875,0.05286850 | 0.04334125 | 0.00507140 |
+-------+--------+--------------------------------------------------------+------------+------------+

I'm hopeful that this won't introduce regressions for most use cases. While I
haven't benchmarked other queries, the MySQL client does feel more responsive
for a wider array of users. One notable exception is for the user dking who
owns 3.7x more batches than has access to, of which all have been deleted. I
don't think this is a common enough use case to make this query even more
complicated than it already is.

Resolves #14599

ehigham · 2024-08-02T15:39:56Z

cjllanwarne · 2024-08-05T19:33:30Z

NB: brrrrr to brrrrr comparison: https://github.com/hail-is/hail/compare/5dbf80e..08e7f6f#diff-0f931312c631fea66daf5de2961a2df18757e69d33e661390c1c2837c4a2efec

batch/batch/front_end/query/query_v2.py

cjllanwarne

This looks like a good reinterpretation of the original PR, with more specific control over the index being used and the ordering of the joins.

Since this is some specific customization based on knowledge of the underlying data, it may be worth documenting in the PR description (or code? or docs?) why this particular ordering (and choice of indexes) is important?

cjllanwarne · 2024-08-05T19:07:27Z

batch/batch/front_end/query/query_v2.py

+     , cancelled_t.cancelled IS NOT NULL AS cancelled
+     , job_groups_n_jobs_in_complete_states.n_completed
+     , job_groups_n_jobs_in_complete_states.n_succeeded
+     , job_groups_n_jobs_in_complete_states.n_failed
+     , job_groups_n_jobs_in_complete_states.n_cancelled
+     , cost_t.cost
+     , cost_t.cost_breakdown


Presumably this layout makes adding and removing lines easier? But did you intend to keep this in the final version?

I did. MySQL (rightly) does not support trailing commas. This is the One True Way (TM) to do split comma-separated lists.

cjllanwarne · 2024-08-06T21:30:11Z

batch/batch/front_end/query/query_v2.py

+     , job_groups_n_jobs_in_complete_states.n_cancelled
+     , cost_t.cost
+     , cost_t.cost_breakdown
+FROM batches IGNORE INDEX (batches_deleted)


I have a slight forward-looking concern over introducing so much mysql-specific syntax to the query. I have a suspicion that postgres support might be a useful option to have in certain potential terra futures. Not enough to block this, but enough to make me mildly uneasy...

If I remember correctly, rawls and sam use MySQL (or cloud sql). Anyway, all that is kind of irrelevant as

different databases have different optimisers and may plan queries differently

many of the queries in batch are very tuned to MySQL (eg STRAIGHT_JOIN, LATERAL, (a, b, ..) IN Expression, etc)

we'd therefore have to consider using different DBMSs very carefully.

Point well taken that this is not "the thin end of the wedge" and more like "a pattern already throughout batch". Point also well taken that adding support is not necessarily a trivial thing to do. So 👍 for following the pattern here.

For more context on my thoughts here - there are certainly 'ways to terra' that do not go through psql, but our terra-on-azure app, for example, cannot have a managed database because of this reliance, though maybe that's more because it was never added than some fundamental blocker

Also I have a natural prior that 'all this optimization for mysql' is just not as necessary in psql because their optimizer is less likely to do the weird things that mysql does. But there's not data behind that, just biases.

our terra-on-azure app, for example, cannot have a managed database because of this reliance, though maybe that's more because it was never added than some fundamental blocker

That's good to know for future work, thanks for sharing.

ehigham · 2024-08-07T15:32:55Z

This looks like a good reinterpretation of the original PR, with more specific control over the index being used and the ordering of the joins.

Since this is some specific customization based on knowledge of the underlying data, it may be worth documenting in the PR description (or code? or docs?) why this particular ordering (and choice of indexes) is important?

Good idea. I'll do that.

[batch] make batches query go brrrrrrr (take 2)

08e7f6f

ehigham requested review from cjllanwarne and chrisvittal August 2, 2024 18:42

ehigham marked this pull request as ready for review August 2, 2024 18:42

ehigham requested review from iris-garden and removed request for chrisvittal August 2, 2024 19:21

ehigham assigned cjllanwarne and iris-garden Aug 2, 2024

iris-garden suggested changes Aug 6, 2024

View reviewed changes

batch/batch/front_end/query/query_v2.py Outdated Show resolved Hide resolved

dont aggregate across job groups

be26ba3

ehigham requested a review from iris-garden August 6, 2024 19:11

cjllanwarne approved these changes Aug 6, 2024

View reviewed changes

ehigham added 2 commits August 7, 2024 14:18

STRAIGHT_JOIN only required on job group tables

86d2136

STRAIGHT_JOIN not required on derived tables

6f25931

iris-garden approved these changes Aug 7, 2024

View reviewed changes

hail-ci-robot merged commit 7d25779 into hail-is:main Aug 8, 2024
4 checks passed

ehigham deleted the ehigham/batches-go-brrrrr-take-2 branch August 8, 2024 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[batch] make batches query go brrrrrrr (for realsies) #14649

[batch] make batches query go brrrrrrr (for realsies) #14649

ehigham commented Aug 1, 2024 •

edited

Loading

ehigham commented Aug 2, 2024

cjllanwarne commented Aug 5, 2024

cjllanwarne left a comment

cjllanwarne Aug 5, 2024

ehigham Aug 7, 2024 •

edited

Loading

cjllanwarne Aug 6, 2024

ehigham Aug 7, 2024

cjllanwarne Aug 7, 2024

cjllanwarne Aug 7, 2024

ehigham Aug 7, 2024

ehigham commented Aug 7, 2024

[batch] make batches query go brrrrrrr (for realsies) #14649

[batch] make batches query go brrrrrrr (for realsies) #14649

Conversation

ehigham commented Aug 1, 2024 • edited Loading

ehigham commented Aug 2, 2024

cjllanwarne commented Aug 5, 2024

cjllanwarne left a comment

Choose a reason for hiding this comment

cjllanwarne Aug 5, 2024

Choose a reason for hiding this comment

ehigham Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

cjllanwarne Aug 6, 2024

Choose a reason for hiding this comment

ehigham Aug 7, 2024

Choose a reason for hiding this comment

cjllanwarne Aug 7, 2024

Choose a reason for hiding this comment

cjllanwarne Aug 7, 2024

Choose a reason for hiding this comment

ehigham Aug 7, 2024

Choose a reason for hiding this comment

ehigham commented Aug 7, 2024

ehigham commented Aug 1, 2024 •

edited

Loading

ehigham Aug 7, 2024 •

edited

Loading