-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU not fully utilized #36
Comments
Lots of bottlenecks unfortunately. Detection can run in almost any batch size, especially on GPUs the higher the better. But there's two issues with that, you need quite a bit of memory and frames need to arrive quickly enough. So far, that's not really given. I've tried offloading reading and writing frames to separate processes, but surprisingly that didn't help much - the additional overhead, at least on my system, ate the gains from parallelisation. So far, the best approach I could find was to massively speed up reading/writing frames. #32 contains a very much WIP draft, but the performance improvements are already quite large. I still have to fix a lot of things there though, and check how this works when using a GPU. TL;DR: Increasing the batch size and improving frame extraction should help. |
almost makes me think having two batches run detections in parallel would be beneficial, however weird that may sound. |
I own an AMD Ryzen 3800xt 8-core CPU with 16 Threads.
When blurring a video, not all of the CPU is being used.
My guess is this has something to do with reading frames from memory, then detecting info, then blurring it, then storing the blurred frame, then repeating the loop.
My profiling skills in python are non-existent, so I can only guess that this is the reason why CPU performance is not at 100% utilization.
weaving/shuffling IO-bound tasks and CPU-bound ones to do them concurrently would be ideal, so reading and writing happens in the background while detections take place
The text was updated successfully, but these errors were encountered: