Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jMonitor mismeasures the CPU for multiprocess jobs #4

Open
graeme-a-stewart opened this issue Jan 28, 2022 · 4 comments
Open

jMonitor mismeasures the CPU for multiprocess jobs #4

graeme-a-stewart opened this issue Jan 28, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@graeme-a-stewart
Copy link
Contributor

I tested jMonitor with a multi-process job (the simple burner executable from https://github.com/HSF/prmon).

When I run with 4 processes, each using 100% of the CPU I would expect to get a CPURate graph at 400%, but instead it's only 100%. I suspect that jMonitor's CPU measurements are only for the parent process and ignore children. This is pretty broken as often the parent is just a wrapper, which forks sub-processes that do the real work.

In fact, if I wrap the job in a shell, indeed I get a zero percent measurement.

$ cat swrap.sh 
#! /bin/sh
/home/graemes/build/prmon/package/tests/burner -t 4 -r 30
$ python3 ./jMonitor --enable-monitor --monitor-backend matplotlib ./swrap.sh

TEST_CPURate

This plot should be at 400%.

@graeme-a-stewart graeme-a-stewart added the bug Something isn't working label Jan 28, 2022
@feipengsy
Copy link
Collaborator

Hi Graeme,

This is indeed broken for process that opens sub-processes. I'll look into this w.r.t how jMonitor handles the sub-processes.

Also, as you commented in the last key4hep meeting, the prmon already provides comprehensive functionalities monitoring the processes. Since there's no need to reinvent the wheel, I'll see how some functionalities or ideas can be reused, while keeping it light-weighted.

Thanks,
Teng

@graeme-a-stewart
Copy link
Contributor Author

Hi @feipengsy

Indeed, this is exactly what was motivating me to look into this. I believe it's entirely possible for jMonitor to delegate the process resource consumption to prmon directly and just pull the values in.

I will work on a proof of concept to show you.

We should certainly discuss this in an upcoming key4hep meeting. In the meantime, as we noted, prmon is here: https://github.com/HSF/prmon, so please do take a look. N.B. it's written in C++, not Python, so you can't borrow the code directly.

Cheers

Graeme

@feipengsy
Copy link
Collaborator

@graeme-a-stewart Thank you. I'll take a look at your pull-request and continue working on this.

B.T.W, due to the Chinese New Year holiday I missed the key4hep meeting yesterday. Hope we could continue discussing this in the next meeting

@feipengsy
Copy link
Collaborator

@graeme-a-stewart
Hi Graeme,

I have made some changes to jMonitor recently. It should now be able to take care of all the sub-processes.

Also, I've added another option that delegate the monitor task to prmon. The changes are included in #8. You could take a look if you wish to.

Thanks,
Teng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants