Skip to content

Commit

Permalink
[batch] make batches query go brrrrrrr (for realsies) (#14649)
Browse files Browse the repository at this point in the history
#14629 improved the speed of listing a user's batches but introduced a
large
regression for listing all batches readble by that user. This change
fixes that
regression by making use index hints and `STRAIGHT_JOIN`s.

The index hint tells MySQL to never consider the index `batches_deleted`
as it
has very low cardinality. In some forms of this query, the planner tries
to use
it to its peril.

A problem in query 0 with #14629 (see below) was that fewer filters on
batches
made the optimiser consider joins in a suboptimal order - it did a table
scan
on `job_groups` first then sorted the results by to `batches.id DESC`
instead
of doing an index scan on `batches` in reverse.

Using `STRAIGHT_JOIN`s instead of `INNER JOIN` mades the optimiser start
from
`batches` and read its index in reverse before considering other tables
in
subsequent joins. From the
[documentation](https://dev.mysql.com/doc/refman/8.4/en/join.html):

> STRAIGHT_JOIN is similar to JOIN, except that the left table is always
read
before the right table. This can be used for those (few) cases for which
the
  join optimizer processes the tables in a suboptimal order.

This is advantageous for a couple of reasons:
- We want to list newer batches first
- For this query, the `batches` table has more applicables indexes
- We want the variable to order by to be in the primary key of the first
  table so we can read the index in reverse

Before and after timings, collected by running the query 5 times, then
using
profiles gathered by MySQL.
```
+-------+---------------------------------------------------*
| query |  description                                      |                                                                                                                                                                                                                                                         
+-------+---------------------------------------------------+
|     0 | All batches accessible to user `ci`               |
|     1 | All batches accessible to user `ci` owned by `ci` |
+-------+---------------------------------------------------*

+-------+--------+--------------------------------------------------------+------------+------------+
| query | branch | timings                                                |    mean    |    stdev   |                                                                                                                                                                                                                                             
+-------+--------+--------------------------------------------------------+------------+------------+
|     0 |  main  | 0.05894400,0.05207850,0.07067875,0.06281800,0.060250   | 0.06095385 | 0.00602255 |
|     1 |  main  | 14.1106150,12.2619323,13.8442850,12.0749633,14.0297822 | 13.2643156 | 0.90087263 |
+-------+--------+--------------------------------------------------------+------------+------------+
|     0 |   PR   | 0.04717375,0.04974350,0.04312150,0.04070850,0.04193650 | 0.04453675 | 0.00339069 |
|     1 |   PR   | 0.04423925,0.03967550,0.03935425,0.04056875,0.05286850 | 0.04334125 | 0.00507140 |
+-------+--------+--------------------------------------------------------+------------+------------+
```

I'm hopeful that this won't introduce regressions for most use cases.
While I
haven't benchmarked other queries, the MySQL client does feel more
responsive
for a wider array of users. One notable exception is for the user
`dking` who
owns 3.7x more batches than has access to, of which all have been
deleted. I
don't think this is a common enough use case to make this query even
more
complicated than it already is.

Resolves #14599
  • Loading branch information
ehigham authored Aug 8, 2024
1 parent ba1ac58 commit 7d25779
Showing 1 changed file with 32 additions and 29 deletions.
61 changes: 32 additions & 29 deletions batch/batch/front_end/query/query_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,40 +125,43 @@ def parse_list_batches_query_v2(user: str, q: str, last_batch_id: Optional[int])
where_conditions.append(f'({cond})')
where_args += args

sql = f"""
SELECT batches.*,
cancelled_t.cancelled IS NOT NULL AS cancelled,
job_groups_n_jobs_in_complete_states.n_completed,
job_groups_n_jobs_in_complete_states.n_succeeded,
job_groups_n_jobs_in_complete_states.n_failed,
job_groups_n_jobs_in_complete_states.n_cancelled,
cost_t.cost, cost_t.cost_breakdown
FROM job_groups
LEFT JOIN batches ON batches.id = job_groups.batch_id
LEFT JOIN billing_projects ON batches.billing_project = billing_projects.name
LEFT JOIN job_groups_n_jobs_in_complete_states ON job_groups.batch_id = job_groups_n_jobs_in_complete_states.id AND job_groups.job_group_id = job_groups_n_jobs_in_complete_states.job_group_id
LEFT JOIN LATERAL (
SELECT 1 AS cancelled
FROM job_group_self_and_ancestors
INNER JOIN job_groups_cancelled
ON job_group_self_and_ancestors.batch_id = job_groups_cancelled.id AND
job_group_self_and_ancestors.ancestor_id = job_groups_cancelled.job_group_id
WHERE job_groups.batch_id = job_group_self_and_ancestors.batch_id AND
job_groups.job_group_id = job_group_self_and_ancestors.job_group_id
) AS cancelled_t ON TRUE
STRAIGHT_JOIN billing_project_users ON batches.billing_project = billing_project_users.billing_project
LEFT JOIN LATERAL (
SELECT COALESCE(SUM(`usage` * rate), 0) AS cost, JSON_OBJECTAGG(resources.resource, COALESCE(`usage` * rate, 0)) AS cost_breakdown
FROM (
sql = f"""\
SELECT batches.*
, cancelled_t.cancelled IS NOT NULL AS cancelled
, job_groups_n_jobs_in_complete_states.n_completed
, job_groups_n_jobs_in_complete_states.n_succeeded
, job_groups_n_jobs_in_complete_states.n_failed
, job_groups_n_jobs_in_complete_states.n_cancelled
, cost_t.cost
, cost_t.cost_breakdown
FROM batches IGNORE INDEX (batches_deleted)
INNER JOIN billing_projects
ON batches.billing_project = billing_projects.name
INNER JOIN billing_project_users
ON batches.billing_project = billing_project_users.billing_project
STRAIGHT_JOIN job_groups
ON job_groups.batch_id = batches.id
STRAIGHT_JOIN job_groups_n_jobs_in_complete_states
ON job_groups.batch_id = job_groups_n_jobs_in_complete_states.id
AND job_groups.job_group_id = job_groups_n_jobs_in_complete_states.job_group_id
LEFT JOIN (SELECT *, 1 AS cancelled FROM job_groups_cancelled) AS cancelled_t
ON job_groups.batch_id = cancelled_t.id
AND job_groups.job_group_id = cancelled_t.job_group_id
INNER JOIN LATERAL (
WITH resource_costs AS (
SELECT resource_id, CAST(COALESCE(SUM(`usage`), 0) AS SIGNED) AS `usage`
FROM aggregated_job_group_resources_v3
WHERE job_groups.batch_id = aggregated_job_group_resources_v3.batch_id AND job_groups.job_group_id = aggregated_job_group_resources_v3.job_group_id
WHERE batch_id = batches.id
AND job_group_id = job_groups.job_group_id
GROUP BY resource_id
) AS usage_t
LEFT JOIN resources ON usage_t.resource_id = resources.resource_id
)
SELECT COALESCE(SUM(`usage` * rate), 0) AS cost
, JSON_OBJECTAGG(resource, COALESCE(`usage` * rate, 0)) AS cost_breakdown
FROM resource_costs
INNER JOIN resources USING (resource_id)
) AS cost_t ON TRUE
WHERE {' AND '.join(where_conditions)}
ORDER BY job_groups.batch_id DESC
ORDER BY batches.id DESC
LIMIT 51;
"""

Expand Down

0 comments on commit 7d25779

Please sign in to comment.