-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batch] make batches query go brrrrrrr (for realsies) #14649
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -125,40 +125,43 @@ def parse_list_batches_query_v2(user: str, q: str, last_batch_id: Optional[int]) | |
where_conditions.append(f'({cond})') | ||
where_args += args | ||
|
||
sql = f""" | ||
SELECT batches.*, | ||
cancelled_t.cancelled IS NOT NULL AS cancelled, | ||
job_groups_n_jobs_in_complete_states.n_completed, | ||
job_groups_n_jobs_in_complete_states.n_succeeded, | ||
job_groups_n_jobs_in_complete_states.n_failed, | ||
job_groups_n_jobs_in_complete_states.n_cancelled, | ||
cost_t.cost, cost_t.cost_breakdown | ||
FROM job_groups | ||
LEFT JOIN batches ON batches.id = job_groups.batch_id | ||
LEFT JOIN billing_projects ON batches.billing_project = billing_projects.name | ||
LEFT JOIN job_groups_n_jobs_in_complete_states ON job_groups.batch_id = job_groups_n_jobs_in_complete_states.id AND job_groups.job_group_id = job_groups_n_jobs_in_complete_states.job_group_id | ||
LEFT JOIN LATERAL ( | ||
SELECT 1 AS cancelled | ||
FROM job_group_self_and_ancestors | ||
INNER JOIN job_groups_cancelled | ||
ON job_group_self_and_ancestors.batch_id = job_groups_cancelled.id AND | ||
job_group_self_and_ancestors.ancestor_id = job_groups_cancelled.job_group_id | ||
WHERE job_groups.batch_id = job_group_self_and_ancestors.batch_id AND | ||
job_groups.job_group_id = job_group_self_and_ancestors.job_group_id | ||
) AS cancelled_t ON TRUE | ||
STRAIGHT_JOIN billing_project_users ON batches.billing_project = billing_project_users.billing_project | ||
LEFT JOIN LATERAL ( | ||
SELECT COALESCE(SUM(`usage` * rate), 0) AS cost, JSON_OBJECTAGG(resources.resource, COALESCE(`usage` * rate, 0)) AS cost_breakdown | ||
FROM ( | ||
sql = f"""\ | ||
SELECT batches.* | ||
, cancelled_t.cancelled IS NOT NULL AS cancelled | ||
, job_groups_n_jobs_in_complete_states.n_completed | ||
, job_groups_n_jobs_in_complete_states.n_succeeded | ||
, job_groups_n_jobs_in_complete_states.n_failed | ||
, job_groups_n_jobs_in_complete_states.n_cancelled | ||
, cost_t.cost | ||
, cost_t.cost_breakdown | ||
FROM batches IGNORE INDEX (batches_deleted) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a slight forward-looking concern over introducing so much mysql-specific syntax to the query. I have a suspicion that postgres support might be a useful option to have in certain potential terra futures. Not enough to block this, but enough to make me mildly uneasy... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I remember correctly, rawls and sam use MySQL (or cloud sql). Anyway, all that is kind of irrelevant as
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Point well taken that this is not "the thin end of the wedge" and more like "a pattern already throughout batch". Point also well taken that adding support is not necessarily a trivial thing to do. So 👍 for following the pattern here. For more context on my thoughts here - there are certainly 'ways to terra' that do not go through psql, but our terra-on-azure app, for example, cannot have a managed database because of this reliance, though maybe that's more because it was never added than some fundamental blocker There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also I have a natural prior that 'all this optimization for mysql' is just not as necessary in psql because their optimizer is less likely to do the weird things that mysql does. But there's not data behind that, just biases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's good to know for future work, thanks for sharing. |
||
INNER JOIN billing_projects | ||
ON batches.billing_project = billing_projects.name | ||
INNER JOIN billing_project_users | ||
ON batches.billing_project = billing_project_users.billing_project | ||
STRAIGHT_JOIN job_groups | ||
ON job_groups.batch_id = batches.id | ||
STRAIGHT_JOIN job_groups_n_jobs_in_complete_states | ||
ON job_groups.batch_id = job_groups_n_jobs_in_complete_states.id | ||
AND job_groups.job_group_id = job_groups_n_jobs_in_complete_states.job_group_id | ||
LEFT JOIN (SELECT *, 1 AS cancelled FROM job_groups_cancelled) AS cancelled_t | ||
ON job_groups.batch_id = cancelled_t.id | ||
AND job_groups.job_group_id = cancelled_t.job_group_id | ||
INNER JOIN LATERAL ( | ||
WITH resource_costs AS ( | ||
SELECT resource_id, CAST(COALESCE(SUM(`usage`), 0) AS SIGNED) AS `usage` | ||
FROM aggregated_job_group_resources_v3 | ||
WHERE job_groups.batch_id = aggregated_job_group_resources_v3.batch_id AND job_groups.job_group_id = aggregated_job_group_resources_v3.job_group_id | ||
WHERE batch_id = batches.id | ||
AND job_group_id = job_groups.job_group_id | ||
GROUP BY resource_id | ||
) AS usage_t | ||
LEFT JOIN resources ON usage_t.resource_id = resources.resource_id | ||
) | ||
SELECT COALESCE(SUM(`usage` * rate), 0) AS cost | ||
, JSON_OBJECTAGG(resource, COALESCE(`usage` * rate, 0)) AS cost_breakdown | ||
FROM resource_costs | ||
INNER JOIN resources USING (resource_id) | ||
) AS cost_t ON TRUE | ||
WHERE {' AND '.join(where_conditions)} | ||
ORDER BY job_groups.batch_id DESC | ||
ORDER BY batches.id DESC | ||
LIMIT 51; | ||
""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this layout makes adding and removing lines easier? But did you intend to keep this in the final version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did. MySQL (rightly) does not support trailing commas. This is the One True Way (TM) to do split comma-separated lists.