new implementation of hand tracking #171
Replies: 18 comments 3 replies
-
It's a great effort! Thanks |
Beta Was this translation helpful? Give feedback.
-
which version of |
Beta Was this translation helpful? Give feedback.
-
it depends - what is the video, what is the intended use and how much processing power you have? you just need to play and tune settings to what fits you best. |
Beta Was this translation helpful? Give feedback.
-
The video would be a single person in front of their webcam (typically in a Zoom). The processing power will vary from user to user, but we can assume it will be a reasonably modern computer, although nothing very high-end. The intended use is to detect what the user's body and face are doing - eg touching their face, looking straight ahead, resting their head on their hand, etc. |
Beta Was this translation helpful? Give feedback.
-
so you need high precision, but don't care about visuals so smooth output is irrelevant i'd probably test with on a side-note, running with |
Beta Was this translation helpful? Give feedback.
-
Thanks, but why do I need caching to be fully disabled? also this point: "but easy enough to make that variable, calculate the value based on detection performance" can you explain this one more, please?
|
Beta Was this translation helpful? Give feedback.
-
if youre running with caching enabled but only every 1sec means that
how box caching works for face and hand is that and next time it runs, it skips (a) and tries to analyze content from virtual boxes instead. |
Beta Was this translation helpful? Give feedback.
-
Thanks!! How are you using human in the demo? is there a specific time at which you re-run it? I mean if there's another way to get the results of human updated every second without re-running it every second? |
Beta Was this translation helpful? Give feedback.
-
I really appreciate your help :) |
Beta Was this translation helpful? Give feedback.
-
demo is flexible, how it runs can be changed in the menus but default (and recommended) is:
running in a double loop (one for detection and one for interpolation and draw) makes canvas updates muuuuch smoother - you see on the bottom of the screen FPS for process (which is actual detection) and FPS for refresh (which is interpolate+draw). there is even faster way in then second part is the same - run interpolation and draw to screen take a look at the |
Beta Was this translation helpful? Give feedback.
-
Thank you for your detailed reply! |
Beta Was this translation helpful? Give feedback.
-
Also I sent you an email, I really hope when you have time you can check it, Thanks! |
Beta Was this translation helpful? Give feedback.
-
this is a very simple example on how to use human in a separate worker thread: // main.js
// note there is no human instance here at all
const humanConfig = {}; // whatever you want goes here
let humanResult = {}; // this will hold last known results that get updated with message from worker thread
const screenshot = new OffscreenCanvas(input.width, input.height); // this will get updated with screenshot
const worker = new Worker('worker.js'); // create worker thread from code in worker.js
function detectLoop() {
const ctx = screenshot.getContext('2d');
// in the main thread get a screenshot of input (whatever it is)
ctx.drawImage(input, 0, 0, screenshot.width, screenshot.height);
// and read pixel data that we'll transfter ownership of to worker thread
// yes, this is not fastest, but its still much better than processing everything in the main thread
const imageData = ctx.getImageData(0, 0, screenshot.width, screenshot.height);
// send message with pixel data and current config to worker
worker.postMessage({ image: imageData.data, width: imageData.width, height: imageData.height, config: humanConfig }, [imageData.data]);
}
worker.addEventListener('message', (msg) => { // listen to messages from worker
humanResult = msg.data.result; // and update latest results
requestAnimationFrame(() => detectLoop()); // start new detection immediately when we got results from previous one
});
detectLoop(); // start loop once and it will continue running on its own
// worker.js
self.importScripts('human.js'); // cannot use esm in workers due to some limited browser support so we load iife version
let human; // this will hold instance of human
onmessage = async (msg) => { // listen for messages from main thread
if (!human) human = new Human.default(msg.data.config); // create human instance if not already created using config that we got from main thread
const image = new ImageData(new Uint8ClampedArray(msg.data.image), msg.data.width, msg.data.height); // reassemble image from data we got
const result = await human.detect(image, msg.data.config); // run actual detection
postMessage({ result }); // send result back to main thread
}; |
Beta Was this translation helpful? Give feedback.
-
Thanks! what is the difference between passing screenshot and video in the case of looping once and continuing running on its own? |
Beta Was this translation helpful? Give feedback.
-
passing screenshot is slower than passing video, but there is no way to pass video to worker thread, only simple data structures can be passed as worker threads dont know anything about any DOM elements. but what you loose due to slower process of creating screenshot, you gain much more because processing is done in worker thread so main thread is completely free and there is no UI impact to user otherwise, passing screenshot or passing video is the same - same rules for caching apply, default config is whatever you put in this is completely different than readme example as in readme example there is a separate loop for drawing results - and you said you dont need that. demo actually does both - webworkers and separate loops |
Beta Was this translation helpful? Give feedback.
-
Thanks! so in the example you mentioned, it processes all the coming frames, and this won't be expensive? |
Beta Was this translation helpful? Give feedback.
-
its expensive, but it doesnt matter since its happening in a separate thread so main thread is never slowed down and user experience is never impacted but if you want to slow it down, replace and if you want to have fixed number of frames per second, calculate |
Beta Was this translation helpful? Give feedback.
-
Thank you so much! |
Beta Was this translation helpful? Give feedback.
-
new implementation of hand tracking
and uses that as likely input for next frame thus skipping a lot of processing of hand re-detection
why? original mediapipes handdetect model is returning a lot of false-positives and causing major issues with further procesing
mediapipe implementation deals with that in the custom
wasm
code in their released module,but that was always performance-prohibitive for a wholistic
human
processing so hand tracking was always sub-parnew model seems faster and more precise in all cases except when hand is under high angles or inverted
as rotation correction before skeleton detection is not yet implemented
new implementation will likely be enabled as default in the next major version, for now leaving it as optional
simply enable in the
human
config object:this triggers not only usage of the new model, but also completely different processing
new functionality
ive modified both implementation to include additional values in results object:
hand[n].boxScore
andhand[n].fingerScore
valueshand[n].landmarks
gestures[n].hand
which do simple finger gesture analysis which is fully extensiblecurrently only implements two simple gestures: 'thumb up' and 'victory'
https://github.com/vladmandic/human/blob/main/src/fingerpose/gestures.ts
all possible configuration values:
https://vladmandic.github.io/human/typedoc/interfaces/HandConfig.html
cc-ing people that have raised issues around hand tracking
cc: @YaraAmin @ButzYung @delebash
feedback is welcome
Beta Was this translation helpful? Give feedback.
All reactions