-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
project.py: Add new 'joblib' dependency to parallelize update #713
base: main
Are you sure you want to change the base?
project.py: Add new 'joblib' dependency to parallelize update #713
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
3fa370f
to
6c1fb2e
Compare
Another tip for faster fetch would be to use ssh instead of http. |
There are a few test failures in https://github.com/zephyrproject-rtos/west/actions/runs/9462265663/job/26086666688?pr=713, please take a look at "Running the tests" in the README.rst Ignore the macOS failures, unrelated. Fixed in #712. But more importantly, unless I'm confused (please correct me) I believe there was a previous attempt to do this and it had to be reverted because of terminal issues on Windows, see 2nd and abandoned attempt in #551. Were you aware of it? This new #713 attempt looks surprisingly much shorter: sounds too good to be true. |
6c1fb2e
to
da14c09
Compare
I did have a minor copy-paste bug, The tests ware very helpful to catch this. I looked into the previous attempt, and my approach seems simpler because I used a library that has been designed to be a drop-in replacement to for loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the previous attempt, and my approach seems simpler because I used a library that has been designed to be a drop-in replacement to for loops.
Ha! That explains. Re-using a higher-level library: great idea. This makes the PR very short indeed, nice job.
So this is actually the most important change in this PR: the addition of a new dependency! This must always be performed very carefully because it carries maintenance and security risks https://cacm.acm.org/practice/surviving-software-dependencies/
This new dependency will also break all existing users, so its name must be at the very least visible in the commit title (and PR title). Something like: "Add new 'joblib' dependency to parallelize update". (while at it don't forget to fix the spelling of "paralelize")
I had a quick look at joblib
and it seems like a serious and actively maintained library; nice pick!
Nevertheless, joblib cannot be immune to weird terminal issues. Does it have workarounds for all Windows quirks? There is only one way to tell: testing. So please state clearly in the commit message all the operating systems and terminals you've tested this with. This must include Powershell and CMD.EXE which is apparently still very important:
Don't forget to make sure colors are still working:
Last but not least, you must provide some -j 1
option. Not just for optimizing and fine-tuning the number of threads but more importantly for the ability to turn that feature completely OFF for two important reasons:
- Debugging git configuration and network issues.
- As a workaround in case someone has weird terminal or other issues.
I have debugged countless build issues and whether it's make, ninja, west or anything concurrent, the very first step is always -j 1
to stop relevant errors from being drowned by concurrent threads. Even when debugging race conditions or other concurrency issues, -j 1
is still a critical datapoint and reference for comparison purposes.
For those reasons, 1. requires unbuffered git output, see --help string in #551. Can joblib do that? If not then you need to add a "joblib-free" code path. Automatically falling back on this joblib-free code path when the dependency is missing would be the icing on the cake (but not required IMHO)
Yes, tests save lives. Speaking of which: make sure all the existing tests stick to (
If
BTW:
performance
|
da14c09
to
7f70536
Compare
rebased on main,
|
I did not test those changes on Windows, as I do not work on that platform. I think that if there is an issue there, we could remove this feature for that platform, or maybe someone more experienced on Windows could help us. But as I mentioned previously, I used off the shelf library to do the heavy lifting, and it is possible that we could delegate any new issues to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I didn't review in detail yet but this is looking better and better.
All tests are failing, hopefully only codestyle issues. Can you please fix? Do you have any issue running tox
on your system? I added some tox
tips in the README.md a while ago.
I did not test those changes on Windows, as I do not work on that platform.
I'm sorry but you must find a Windows system or "Windows friend" to smoke test this PR with at least -j1
and not, in both CMD.EXE and Powershell. The previous attempt went through a couple rounds of reverts and we really don't want to do that again. Maybe ask on Discord? There is a #west
channel there; also try all the CI and infrastructure channels. People there may not use Windows but they may desire this enough to go and find a "Windows friend" of theirs :-)
I think that if there is an issue there, we could remove this feature for that platform,
Yes: for instance we could default to -j1
on Windows, at least initially! This would make it so much easier to find "Windows friends" willing to test it: now they don't have to fetch a PR to test. This is exactly what I strongly advocated in the previous attempt but I was unfortunately overruled.
7f70536
to
9d8c8af
Compare
I tested it on Windows in Power shell. When using On the other hand, when more threads are used, the output gets a little messy. I can see how this could be fixed, but it will require a lot more refactoring in the So just to summarize, the default behavior is the same as before this PR, so you have to explicitly state that you want to use multiple threads for update. On Windows the output with multiple threads is messy, but when single thread is used there is no change compared to version from main. On Linux the output looks a little better, as the lines are not interrupted, but you see lines from multiple processes one under another. IMO this is not a big deal, to make it right we would have to collect all the output, and print it only after the job is completed, this would cause some idle time for the user (there is a package to provide progress bar for joblib so this could solve our issue of idle output for user https://pypi.org/project/joblib-progress/). Furthermore, there is always option to not use this, and I imagine that this feature will be used in automation scripts, where we do not care about the output, just on the time. |
Wait, so you're saying joblib/loky have no ability to capture terminal/standard outputs? I skimmed their APIs and I found joblib/loky#306 which all seem to confirm they indeed don't care about standard outputs. Their focus is parallel computation, which I bet uses standard outputs rarely - only when something goes wrong. This is a MAJOR disappointment :-( All this time I was confused and assumed joblib/loky offered this ability to capture standard outputs. I thought this was why the I wouldn't mind a first shot that does not capture outputs (others may disagree) but there would need to be very clear path towards implementing this in the future and I see none with
As you can see the title of the aborted fix #533 was
Failures happen in automation, then output is important too. In case of failure, outputs can be even more important than in interactive use because there is often no simple access to the automation environment and no or little "trial and error" possibility to test again. Changing verbosity levels and other parameters is also more difficult at best and impossible at worst. PS: I also discovered that the main reason |
Moreover: compilers, linkers etc. are used to have their outputs interleaved with concurrent processes, it's been like this for them for decades. So they expect concurrency and generally make a good effort to push standalone "chunks" of information. At the receiving end, make and ninja try to somehow respect that. On the other hand, I really doubt Trying to summarize:
pytest.mark.parametrize("options", [""]) #, "-j 1", "-j 2", "-j -1"])`
def test_update_some_with_imports(options, repos_tmpdir): Again, this will help anyone experimenting with any sort of |
@@ -416,14 +416,14 @@ def test_grep(west_init_tmpdir): | |||
|
|||
assert re.search('west-commands', cmd('grep -- -- -commands')) | |||
|
|||
|
|||
def test_update_projects(west_init_tmpdir): | |||
@pytest.mark.parametrize("options", ["", "-j 1", "-j 2", "-j"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -j
indented? should not be -j -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree seems so from here: parser.add_argument('-j', '--jobs', nargs='?', const=-1,
that -1
should be passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - joblib looks nice but it does not seem to add any value compared to built-in Python modules.
Agreed, something like? from multiprocessing.pool import Pool
# ...
processes = self.args.jobs if self.args.jobs > 0 else None
if processes == 1:
for project in self.manifest.projects:
project_update(project)
else:
pool = Pool(processes=processes)
pool.map_async(project_update, self.manifest.projects)
pool.join() |
Yes something like this. |
@RobertGalatNordic any update? |
@pdgendt if you have time to spend on this, I think you could make pretty short work of a re-implementation of this that uses some built-in, |
I took a quick stab at it, but it's not as easy apparently, because of the shared memory stuff. |
src/west/app/project.py
Outdated
help='''Use multiple jobs to paralelize update process. | ||
Pass -1 to use all avaliable jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help='''Use multiple jobs to paralelize update process. | |
Pass -1 to use all avaliable jobs. | |
help='''Use multiple jobs to parallelize the update process. | |
Pass -1 to use all available jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
9d8c8af
to
34dfe38
Compare
rebased to address switch to pyproject.toml + @pdgendt comments re: typos |
The projects from manifests were updated one by one. The parallel aproach has potential to fully utilize CPU and network connection to speed up the update. Signed-off-by: Robert Gałat <[email protected]>
34dfe38
to
49fc40c
Compare
The "shared memory stuff" is also the... messenger. It's telling us that this PR runs concurrently code that was never designed to. What could possibly go wrong? On top of all other issues already discussed above. I took a deep dive in the code last weekend and I think an |
The projects from manifests were updated one by one. The parallel approach has potential to fully utilize CPU and network connection to speed up the update.
Needs more testing, I checked only few common cases and it seems fine.
Performance tests:
west init -m https://github.com/nrfconnect/sdk-nrf --mr main
time west update
system and network load during update:
Clean up and replace west for version from this PR
rm -rf <.west and all downloaded repositories>
pip3 uninstall west
cd <west_source_dir>
pip3 install -e .
cd ..
repeat the test
west init -m https://github.com/nrfconnect/sdk-nrf --mr main
time west update
system and network load during parallel update:
As we can see, the result is substantially faster. The same operation finishes in less than half of the original time (from 14.5 minutest down to 6.5 minutes).