[batch] Create new job groups without ability to add jobs #14018

jigold · 2023-11-16T20:32:52Z

Stacked on #14016. This PR needs to have the client/server protocol for creating job groups for the four types of creation/update events hashed out and implemented. Basic tests are there. We still need tests for billing and cancellation to make sure the aggregation and cancellation operations work properly.

…oups

daniel-goldstein

This might not cover everything because I think some new commits came in while I was reviewing. The majority of lines changed here look great, I just have a couple core concerns with the API changes that I think are two strongly driven by the *-fast endpoints. In particular, I think update-fast is doing a lot here. update-fast is a convenience for bundling up the following steps:

Create an update of jobs
Submit jobs
Commit the update of those jobs

I might be wrong but it seems to me like that is now the de-facto way in aioclient to also submit just a job group with no jobs and that seems wrong. I think we would be better served by starting simpler and adding the ability to bundle operations once the right "primitives" are established. I think we would not be losing much if we just left all the update related operations alone and added a route for creating job groups and we would reduce the complexity greatly. We can then revisit by adding a way to "create a job group and a jobs update in one request"

daniel-goldstein · 2023-12-01T18:36:14Z

batch/batch/front_end/validate.py

+        'callback': nullable(str_type),
+        'cancel_after_n_failures': nullable(numeric(**{"x > 0": lambda x: isinstance(x, int) and x > 0})),
+        'absolute_parent_id': nullable(int_type),
+        'in_update_parent_id': nullable(int_type),


I think the semantics of this PR are sound but this name trips me up, especially above when the job spec contains the in_update_job_group_id. I think there are two distinct concepts here that are both being given the name "update" and I want to make sure they're not conflated. Am I correct that this field can only refer to job groups that are submitted together within the same HTTP request to the front-end? Because we are not reserving ranges of job group IDs in the database there is no way to resolve a relative ID across different requests. So this is more like an in_request_parent_id.

I see why you initially had this as part of the batch_updates table, so that you could use relative IDs in this way, but I don't see this being a heavily used feature right away and I hesitate to introduce changes that are not heavily used and therefore heavily tested. For QOB, we are basically never going to submit nested job groups in the same request. The absolute parent job group ID is always known because it is the job group the query driver is running in.

Additionally, wasn't there a conversation of limiting the depth of a job group tree? If so, there would be a small upper bound on the number of requests necessary to submit job group specs one layer of the tree at a time, which I think disincentivizes this feature even more. What do you think about removing in_update_parent_id for now?

daniel-goldstein · 2023-12-01T18:38:24Z