-
Notifications
You must be signed in to change notification settings - Fork 4
refactor: add JobSpec common abstraction #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3eb2132 to
d881ab0
Compare
d881ab0 to
1604d40
Compare
rswarbrick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general point, I'm very excited about this sort of refactor: it makes things much cleaner (and less circular!)
BUT: I've spent 15 minutes reading and I'm half-way through the first commit, in which I've found at least three orthogonal changes.
Please could you split up the commit? And the "basic cleanup" stuff is perfectly sensible, but it can be in parallel PRs.
|
|
||
| try: | ||
| os.makedirs(arg_scratch_root, exist_ok=True) | ||
| Path(arg_scratch_root).mkdir(exist_ok=True, parents=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal, but things like the changes to this particular file could definitely have been broken out into their own commit, I think?
Indeed (reading further...) the are several other parts of this commit that are all about converting to using Path. It's completely reasonable to do so, but please can you avoid polluting an otherwise-nontrivial commit with extra gubbins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's not crucial, but is just a bit easier on reviewers when non-functional changes are split up into a separate commit marked as such. I personally find it easier to scrutinize the important changes with less noise, partially contributed to by the GitHub UI being challenging at times.
Also, this change is like the first thing that appears in the list, so it's easier to nitpick about on that basis too :)
| job_time_s = tr.job_runtime | ||
| sim_time_us = tr.simulated_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change part of switching to the JobSpec abstraction? I suspect not, so this should be a separate commit. Also, aren't we switching to using JobTime objects? Probably drop the units from the variable names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is part of the JobSpec abstraction, due to the coupled nature of the original code. The JobSpec objects needs to be immutable and serialisable as pydantic model, JobTime objects are mutable and are not serialisable.
So the JobTime can't be stored within a JobSpec.
The only place I can see in the codebase where the JobTime is used downstream is this place, and they convert to fixed units. So rather than using JobTime, I've reverted to just using simple integers.
Long term we could fix up JobTime so it's immutable and serialisable, although I'm not sure if there is an actual use-case for this? Python has built-in datetime data types, so why don't we just use those?
Either way, I think this is a separate PR. This one is already too big as you correctly point out.
I've tried to make minimal changes (although did mix some linting fixes in).
| scratch_path: Path | ||
|
|
||
|
|
||
| class JobSpec(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add docstrings to the various members that this adds? In many cases, this will just be a matter of moving the comment to come just after the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docstrings for the class attributes as best as I can.
src/dvsim/job/time.py
Outdated
|
|
||
| # TODO: Migrate to Time instead of a custom implementation | ||
| class JobTime: # noqa: PLW1641 # Muitable object should not implement __hash__ | ||
| class JobTime: # noqa: PLW1641 Muitable object should not implement __hash__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup, but....
- It should be a separate commit
- "Muitable" still isn't a word :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted my change... it can be fixed in another commit as you ask (at some other time).
Other than mixing linting with refactoring, I can’t see a way of splitting this up. It might not be obvious why the changes are needed, and often it seems unrelated. But this is because of the coupled nature of the code. I can try and pull out the linting fixes now there is a working end state, but I don't think I can split up the main changes more than they currently can (without creating commits with broken states). |
|
Really?! Are you really saying that e.g. the change to to using |
ec408d8 to
4f03c0d
Compare
No, as per the description of the PR, the point of this PR is to implement the There are also a hand full that are not necessary for the purpose of this PR, which I made on the way to fix linting warnings over the several days of effort it took to make these changes. My comment about "Other than mixing linting with refactoring" includes the |
4f03c0d to
7b8f61c
Compare
hcallahan-lowrisc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks @machshev! This feels like a nice abstraction layer to expand into.
|
|
||
| try: | ||
| os.makedirs(arg_scratch_root, exist_ok=True) | ||
| Path(arg_scratch_root).mkdir(exist_ok=True, parents=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's not crucial, but is just a bit easier on reviewers when non-functional changes are split up into a separate commit marked as such. I personally find it easier to scrutinize the important changes with less noise, partially contributed to by the GitHub UI being challenging at times.
Also, this change is like the first thing that appears in the list, so it's easier to nitpick about on that basis too :)
src/dvsim/job/data.py
Outdated
| # Licensed under the Apache License, Version 2.0, see LICENSE for details. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """Job data models.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this might not be the final refactoring, but could you expand the description here with some more context for how these objects are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with some context.
src/dvsim/launcher/base.py
Outdated
| self.prepare_workspace_for_cfg(workspace_cfg) | ||
| Launcher.workspace_prepared_for_cfg.add(project) | ||
|
|
||
| # Store the deploy object handle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. stale comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot
| ] | ||
|
|
||
| deploy.cov_results_dict = {k: f"{random() * 100:.2f} %" for k in keys} | ||
| # TODO: this hack doesn't work any more and needs implementing by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does removing this leave us broken in some way? If so, I would delete the code and create an issue to track. If not, maybe just delete the code entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't leave us in a broken state, just means the fake data from the fake launcher doesn't contain coverage data... so report template generation is not quite as nice. It still creates random pass/fail results for the report.
I'm sure we can add the functionality back in, just not quite sure how at the moment.
The `Deploy` classes are coupled to the scheduler and also the flow objects in a way that is not clear. It's also not clear which of the public attributes are depended on externally and which are internal working out. The intention is to make clear what the interface is that the scheduler depends on and what the flow object depends on as a result of the job runs. Initially I tried to refactor the `Deploy` objects themselves to make the existing objects clear, however this failed due to too many moving parts and the level of coupling. This commit introduces a `JobSpec` pydantic data model that gives a fully typed and runtime validated model containing everything the scheduler needs to run the job. The scheduler no longer receives `Deploy` objects, but instead `JobSpec` objects created from `Deploy` objects. Which means `Deploy` objects become `JobSpec` factories. In addition, the obvious dependencies on `Deploy` objects within the flow objects have been replaced with an expanded `CompletedJobStatus` pydantic model. There may still be dependencies that exist that have been missed, but the main ones have been removed in the results processing path. This opens the way for writing tests for the scheduler, and potentially the launchers, against clear interfaces using data classes as inputs and outputs. There is room for improvement in the data classes, as some attributes can potentially be removed with some refactoring. Signed-off-by: James McCorrie <[email protected]>
Signed-off-by: James McCorrie <[email protected]>
Now we have a Pydantic model that represents the full data requirement of the scheduler, we can use the builtin `JobSpec.model_dump()` instead of the custom `Deploy.dump()`. At the moment this model has to contain a couple of callback functions which cannot be serialised, so these attributes are excluded. This commit does change the format of the dumped "deployment" objects file. So if this is being used to check for breaking changes, then hashes need to be compared against one generated from this commit from now on. Signed-off-by: James McCorrie <[email protected]>
7b8f61c to
7bd5fba
Compare
The
Deployclasses are coupled to the scheduler and also the flow objects in a way that is not clear. It's also not clear which of the public attributes are depended on externally and which are internal working out.The intention is to make clear what the interface is that the scheduler depends on and what the flow object depends on as a result of the job runs. Initially I tried to refactor the
Deployobjects themselves to make the existing objects clear, however this failed due to too many moving parts and the level of coupling.This commit introduces a
JobSpecpydantic data model that gives a fully typed and runtime validated model containing everything the scheduler needs to run the job. The scheduler no longer receivesDeployobjects, but insteadJobSpecobjects created fromDeployobjects. Which meansDeployobjects becomeJobSpecfactories.In addition, the obvious dependencies on
Deployobjects within the flow objects have been replaced with an expandedCompletedJobStatuspydantic model. There may still be dependencies that exist that have been missed, but the main ones have been removed in the results processing path.flowchart LR subgraph flowS [flows] F1[Flow 1] F2[Flow 2] end subgraph Deploy D1(Deploy 1) D2(Deploy 2) D3(Deploy 3) J1(JobSpec 1) J2(JobSpec 2) J3(JobSpec 3) end subgraph Results C1 C2 C3 end subgraph flowE [flows] F1e[Flow 1] F2e[Flow 2] end F1 --> D1 --> J1 --> S[Scheduler] --> C1(CompletedJobStatus) --> F1e F1 --> D2 --> J2 --> S --> C2(CompletedJobStatus) --> F1e F2 --> D3 --> J3 --> S --> C3(CompletedJobStatus) --> F2eThis opens the way for writing tests for the scheduler, and potentially the launchers, against clear interfaces using data classes as inputs and outputs.
There is room for improvement in the data classes, as some attributes can potentially be removed with some refactoring.
Implements #46