new implementation of hand tracking #171

vladmandic · 2021-09-24T13:30:50Z

vladmandic
Sep 24, 2021
Maintainer

new implementation of hand tracking

combines custom hand bounding box detection with mediapipe handskeleton prediction
implements smart bounding box caching which recalculates bounding box depending on detected hand skeleton
and uses that as likely input for next frame thus skipping a lot of processing of hand re-detection
return object structure is same when using either implementation

why? original mediapipes handdetect model is returning a lot of false-positives and causing major issues with further procesing
mediapipe implementation deals with that in the custom wasm code in their released module,
but that was always performance-prohibitive for a wholistic human processing so hand tracking was always sub-par

new model seems faster and more precise in all cases except when hand is under high angles or inverted
as rotation correction before skeleton detection is not yet implemented

new implementation will likely be enabled as default in the next major version, for now leaving it as optional

simply enable in the `human` config object:

hand: { enabled: true, maxDetected: 1, minConfidence: 0.5, detector: { modelPath: 'handtrack.json' } }

this triggers not only usage of the new model, but also completely different processing

new functionality

ive modified both implementation to include additional values in results object:

new separate hand[n].boxScore and hand[n].fingerScore values
new hand[n].landmarks gestures[n].hand which do simple finger gesture analysis which is fully extensible
currently only implements two simple gestures: 'thumb up' and 'victory'
https://github.com/vladmandic/human/blob/main/src/fingerpose/gestures.ts

all possible configuration values:

https://vladmandic.github.io/human/typedoc/interfaces/HandConfig.html

cc-ing people that have raised issues around hand tracking
cc: @YaraAmin @ButzYung @delebash

feedback is welcome

YaraAmin · 2021-10-13T17:28:17Z

YaraAmin
Oct 13, 2021

It's a great effort! Thanks
For me when It detects hand, the outputs from the hand model and post-processing are pretty accurate. But I noticed that a lot of times, It can't detect hands at all even in the case when I'm putting my hand in a very clear position for detection.
there are the configs I'm using : hand: { enabled: true, maxDetected: 1, minConfidence: 0.2, detector: { modelPath: 'handtrack.json' } }, I set minConfidence to a low value to test but still can't detect most of the time
Do you recommend anything regarding this?

2 replies

vladmandic Oct 13, 2021
Maintainer Author

Does it detect it eventually if you don't move the hand or it never gets detected?

YaraAmin Oct 13, 2021

sometimes it did detect it but after like 3 or 4 seconds -I'm using a set interval to run human.detect every second- and the next second it returns an empty array for hand, all of this happens while my hand is in the same position and posture.
PS: I'm using cacheSensitivity : 0.75

vladmandic · 2021-10-13T18:41:16Z

vladmandic
Oct 13, 2021
Maintainer Author

which version of human? there was a bug in tracking that caused previously detected hands disappear, resolved recently.
also, running detection when there was no hand detected in previous frame is expensive, so by default it gets delayed a bit - thats why it takes a bit to pick up the hand again. since youre running at lower frequency, force it to remove the delay by setting config.hand.skipFrames = 0 (this part on re-running detector with delay is still being tuned and will be changed in the near future)

1 reply

YaraAmin Oct 13, 2021

Human 2.3.2. I will try this one, thanks!
Can I send to you my current configs of Human and check if they are the best for every second detection? I'm passing a video reference using React.createRef.current.video to human.detect and I need to extract info by applying some logic on the models output at every second. is there a less expensive way to do that?

vladmandic · 2021-10-13T19:04:32Z

vladmandic
Oct 13, 2021
Maintainer Author

it depends - what is the video, what is the intended use and how much processing power you have? you just need to play and tune settings to what fits you best.

0 replies

YaraAmin · 2021-10-13T19:21:56Z

YaraAmin
Oct 13, 2021

The video would be a single person in front of their webcam (typically in a Zoom). The processing power will vary from user to user, but we can assume it will be a reasonably modern computer, although nothing very high-end. The intended use is to detect what the user's body and face are doing - eg touching their face, looking straight ahead, resting their head on their hand, etc.

0 replies

vladmandic · 2021-10-14T11:50:38Z

vladmandic
Oct 14, 2021
Maintainer Author

so you need high precision, but don't care about visuals so smooth output is irrelevant

i'd probably test with
a) running detection in a separate web worker so there is no visible performance impact to user
b) caching fully disabled
c) running at even slower interval - but easy enough to make that variable, calculate value based on detection performance

on a side-note, running with minConfidence: 0.2 means there are going to be a lot of false positives that need to be analyzed - it can be counter-productive. better to set it to a reasonable higher value.

0 replies

YaraAmin · 2021-10-14T12:00:35Z

YaraAmin
Oct 14, 2021

Thanks, but why do I need caching to be fully disabled? also this point: "but easy enough to make that variable, calculate the value based on detection performance" can you explain this one more, please?

minConfidence: 0.2 I just put it for the sake of getting anything from hand detection

0 replies

vladmandic · 2021-10-14T12:08:43Z

vladmandic
Oct 14, 2021
Maintainer Author

if youre running with caching enabled but only every 1sec means that

sometimes you'll get results which are 2 sec old anyhow, so why bother
analyzing objects that move is more likely to fail. e.g., hand is probably not going to be in the similar position after 1sec, so it wont be able to lock on it to analyze it. then it will disappear and reappear only when enough frames have passed to run full detection.

how box caching works for face and hand is that
a) runs detection on full input and generates possible boxes to analyze
b) runs analysis on those boxes and returns results
c) creates virtual boxes by looking at detected keypoints (face or hands) and expanding them by 25% in each direction

and next time it runs, it skips (a) and tries to analyze content from virtual boxes instead.
so if hand or face moved more than just a bit between runs, it will fail - it assumes hand doesnt move that fast if youre running detection at full speed. but if youre running only every 1sec, hand likely moved too much so cached boxes is no longer valid

0 replies

YaraAmin · 2021-10-14T12:24:19Z

YaraAmin
Oct 14, 2021

Thanks!! How are you using human in the demo? is there a specific time at which you re-run it? I mean if there's another way to get the results of human updated every second without re-running it every second?
Also if there is no way and I actually need to run it every second: Do you recommend these configs for my case?
HumanConfig = { backend: 'humangl', debug: false, async: true, cacheSensitivity : 0, filter: { enabled: false }, face: { enabled: true, detector: { enabled: true, modelPath: 'blazeface.json' ,minConfidence: 0.5, rotation: false, maxDetected: 1, skipFrames: 15 }, mesh: { enabled: true, modelPath: 'facemesh.json' }, iris: { enabled: true, modelPath: 'iris.json' }, description: { enabled: true, modelPath: 'faceres.json', skipFrames: 11, minConfidence: 0.1 }, emotion: { enabled: true, modelPath: 'emotion.json', minConfidence: 0.2, skipFrames: 17}, }, hand: { enabled: true, maxDetected: 1, minConfidence: 0.5, detector: { modelPath: 'handtrack.json' } }, body: { enabled: true , modelPath: 'movenet-lightning.json',minConfidence: 0.5,skipFrames: 1}, object: { enabled: false}, };

0 replies

YaraAmin · 2021-10-14T12:24:56Z

YaraAmin
Oct 14, 2021

I really appreciate your help :)

0 replies

vladmandic · 2021-10-14T14:19:53Z

vladmandic
Oct 14, 2021
Maintainer Author

demo is flexible, how it runs can be changed in the menus

but default (and recommended) is:

run human.detect in separate web worker and run as fast as possible in its own requestAnimationFrame loop
detection results update lastDetectedResult object
run separate loop also using requestAnimationFrame as fast as possible
that loop takes whatever is in lastDetectedResult and
1. runs human.next interpolation on it (interpolation only takes 1-2ms)
2. draws interpolated results to screen using human.draw

running in a double loop (one for detection and one for interpolation and draw) makes canvas updates muuuuch smoother - you see on the bottom of the screen FPS for process (which is actual detection) and FPS for refresh (which is interpolate+draw).

there is even faster way in demo/multithread which extends this principle but runs separate human instance in multiple threads
so you get one thread that runs just face, one thread that runs just body, etc.
and all results are merged in single lastDetectedResult in the main thread

then second part is the same - run interpolation and draw to screen

take a look at the README, section "using interpolated results for smooth video processing by separating detection and drawing loops"

0 replies

YaraAmin · 2021-10-14T15:04:57Z

YaraAmin
Oct 14, 2021

Thank you for your detailed reply!
I don't need to draw the results on top of the frames, I just need to pass a stream and get an updated array of Human output every second, Now I'm a little confused about how to achieve that!

0 replies

YaraAmin · 2021-10-14T15:05:40Z

YaraAmin
Oct 14, 2021

Also I sent you an email, I really hope when you have time you can check it, Thanks!

0 replies

vladmandic · 2021-10-14T15:35:07Z

vladmandic
Oct 14, 2021
Maintainer Author

this is a very simple example on how to use human in a separate worker thread:

// main.js
// note there is no human instance here at all

const humanConfig = {}; // whatever you want goes here
let humanResult = {}; // this will hold last known results that get updated with message from worker thread
const screenshot = new OffscreenCanvas(input.width, input.height); // this will get updated with screenshot
const worker = new Worker('worker.js'); // create worker thread from code in worker.js

function detectLoop() {
  const ctx = screenshot.getContext('2d');
  // in the main thread get a screenshot of input (whatever it is)
  ctx.drawImage(input, 0, 0, screenshot.width, screenshot.height);
  // and read pixel data that we'll transfter ownership of to worker thread
  // yes, this is not fastest, but its still much better than processing everything in the main thread
  const imageData = ctx.getImageData(0, 0, screenshot.width, screenshot.height);
  // send message with pixel data and current config to worker
  worker.postMessage({ image: imageData.data, width: imageData.width, height: imageData.height, config: humanConfig }, [imageData.data]);
}

worker.addEventListener('message', (msg) => { // listen to messages from worker
  humanResult = msg.data.result; // and update latest results
  requestAnimationFrame(() => detectLoop()); // start new detection immediately when we got results from previous one
});

detectLoop(); // start loop once and it will continue running on its own

// worker.js

self.importScripts('human.js'); // cannot use esm in workers due to some limited browser support so we load iife version
let human; // this will hold instance of human

onmessage = async (msg) => { // listen for messages from main thread
  if (!human) human = new Human.default(msg.data.config); // create human instance if not already created using config that we got from main thread
  const image = new ImageData(new Uint8ClampedArray(msg.data.image), msg.data.width, msg.data.height); // reassemble image from data we got
  const result = await human.detect(image, msg.data.config); // run actual detection
  postMessage({ result }); // send result back to main thread
};

0 replies

YaraAmin · 2021-10-14T16:09:17Z

YaraAmin
Oct 14, 2021

Thanks! what is the difference between passing screenshot and video in the case of looping once and continuing running on its own?
this example and the one in the README, section "using interpolated results for smooth video processing by separating detection and drawing loops" beside of the separate webworker? Also is there a time duration by which I can expect changes in result for both cases: passing video and passing screenshot? For the case with passing a screenshot, I should get different results for each frame? what is its default config and how to change it ?

0 replies

vladmandic · 2021-10-14T16:15:28Z

vladmandic
Oct 14, 2021
Maintainer Author

passing screenshot is slower than passing video, but there is no way to pass video to worker thread, only simple data structures can be passed as worker threads dont know anything about any DOM elements. but what you loose due to slower process of creating screenshot, you gain much more because processing is done in worker thread so main thread is completely free and there is no UI impact to user

otherwise, passing screenshot or passing video is the same - same rules for caching apply, human doesnt care.

default config is whatever you put in humanConfig in the main thread as config gets passed to thread together with snapshot.

this is completely different than readme example as in readme example there is a separate loop for drawing results - and you said you dont need that.

demo actually does both - webworkers and separate loops

0 replies

YaraAmin · 2021-10-14T16:25:36Z

YaraAmin
Oct 14, 2021

Thanks! so in the example you mentioned, it processes all the coming frames, and this won't be expensive?
In case of being expensive, how can I choose a specific number of frames to be processed per second for example?

0 replies

vladmandic · 2021-10-14T16:31:47Z

vladmandic
Oct 14, 2021
Maintainer Author

its expensive, but it doesnt matter since its happening in a separate thread so main thread is never slowed down and user experience is never impacted

but if you want to slow it down, replace requestAnimationFrame with setTimeout and some delay value

and if you want to have fixed number of frames per second, calculate setTimeout delay value based on how much time worker took for processing

0 replies

YaraAmin · 2021-10-14T16:51:50Z

YaraAmin
Oct 14, 2021

Thank you so much!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new implementation of hand tracking #171

{{title}}

Replies: 18 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

new implementation of hand tracking #171

vladmandic Sep 24, 2021 Maintainer