Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add face detection constraints and VideoFrame attributes #48

Closed
wants to merge 1 commit into from

Conversation

eehakkin
Copy link
Contributor

@eehakkin eehakkin commented Jan 11, 2022

This spec update is a follow up to w3c/mediacapture-image#292 and allows face detection as described in #44.

The changes include a new face detection constrainable properties which are used for controlling the face detection.

The face detection results are exposed by VideoFrames through a new readonly detectedFaces sequence attribute.

This allows following kind of code to be used for face detection:

// main.js:
// Check if face detection is supported by the browser
const supports = navigator.mediaDevices.getSupportedConstraints();
if (supports.faceDetectionMode) {
  // Browser supports face contour detection.
} else {
  throw('Face contour detection is not supported');
}

// Open camera with face detection enabled
const stream = await navigator.mediaDevices.getUserMedia({
  video: {faceDetectionMode: ['bounding-box', 'contour']}
});
const [videoTrack] = stream.getVideoTracks();

// Use a video worker and show to user.
const videoElement = document.querySelector("video");
const videoGenerator = new MediaStreamTrackGenerator({kind: 'video'});
const videoProcessor = new MediaStreamTrackProcessor({track: videoTrack});
const videoSettings = videoTrack.getSettings();
const videoWorker = new Worker('video-worker.js');
videoWorker.postMessage({
  videoReadable: videoProcessor.readable,
  videoWritable: videoGenerator.writable
}, [videoProcessor.readable, videoGenerator.writable]);
videoElement.srcObject = new MediaStream([videoGenerator]);

// video-worker.js:
self.onmessage = async function(e) {
  const videoTransformer = new TransformStream({
    async transform(videoFrame, controller) {
      for (const face of videoFrame.detectedFaces) {
        console.log(
          `Face @ (${face.contour[0].x}, ${face.contour[0].y}), ` +
                 `(${face.contour[1].x}, ${face.contour[1].y}), ` +
                 `(${face.contour[2].x}, ${face.contour[2].y}), ` +
                 `(${face.contour[3].x}, ${face.contour[3].y})`);
      }
      controller.enqueue(videoFrame);
    }
  });
  e.data.videoReadable
  .pipeThrough(videoTransformer)
  .pipeTo(e.data.videoWritable);
}

Preview | Diff

@eehakkin
Copy link
Contributor Author

Sorry for closing and reopening. This one should be open and w3c/mediacapture-image#292 should be closed.

@riju
Copy link

riju commented Jan 17, 2022

@alvestrand , @youennf : We tried to incorporate the review comments as per our last discussions. Could you please take a look ?

@riju
Copy link

riju commented Jan 27, 2022

Friendly ping @alvestrand, @youennf, @jan-ivar

@dontcallmedom
Copy link
Member

@riju I think it would help if you could document which comments exactly from the last discussion you incorporated and how - for instance, I still see a FaceExpression enum - with fewer values, but still some.

@eehakkin
Copy link
Contributor Author

@dontcallmedom I removed face expressions completely.

@eehakkin
Copy link
Contributor Author

eehakkin commented Feb 5, 2022

The following example from #57 shows how to use face detection, background concealment (see #45) and eye gaze correction (see #56) with MediaStreamTrack Insertable Media Processing using Streams:

// main.js:
// Open camera.
const stream = navigator.mediaDevices.getUserMedia({video: true});
const [videoTrack] = stream.getVideoTracks();

// Use a video worker and show to user.
const videoElement = document.querySelector('video');
const videoWorker = new Worker('video-worker.js');
videoWorker.postMessage({track: videoTrack}, [videoTrack]);
const {data} = await new Promise(r => videoWorker.onmessage);
videoElement.srcObject = new MediaStream([data.videoTrack]);

// video-worker.js:
self.onmessage = async ({data: {track}}) => {
  // Apply constraints.
  let customBackgroundBlur = true;
  let customEyeGazeCorrection = true;
  let customFaceDetection = false;
  let faceDetectionMode;
  const capabilities = track.getCapabilities();
  if (capabilities.backgroundBlur && capabilities.backgroundBlur.max > 0) {
    // The platform supports background blurring.
    // Let's use platform background blurring and skip the custom one.
    await track.applyConstraints({
      advanced: [{backgroundBlur: capabilities.backgroundBlur.max}]
    });
    customBackgroundBlur = false;
  } else if ((capabilities.faceDetectionMode || []).includes('contour')) {
    // The platform supports face contour detection but not background
    // blurring. Let's use platform face contour detection to aid custom
    // background blurring.
    faceDetectionMode ||= 'contour';
    await videoTrack.applyConstraints({
      advanced: [{faceDetectionMode}]
    });
  } else {
    // The platform does not support background blurring nor face contour
    // detection. Let's use custom face contour detection to aid custom
    // background blurring.
    customFaceDetection = true;
  }
  if ((capabilities.eyeGazeCorrection || []).includes(true)) {
    // The platform supports eye gaze correction.
    // Let's use platform eye gaze correction and skip the custom one.
    await videoTrack.applyConstraints({
      advanced: [{eyeGazeCorrection: true}]
    });
    customEyeGazeCorrection = false;
  } else if ((capabilities.faceDetectionLandmarks || []).includes(true)) {
    // The platform supports face landmark detection but not eye gaze
    // correction. Let's use platform face landmark detection to aid custom eye
    // gaze correction.
    faceDetectionMode ||= 'presence';
    await videoTrack.applyConstraints({
      advanced: [{
        faceDetectionLandmarks: true,
        faceDetectionMode
      }]
    });
  } else {
    // The platform does not support eye gaze correction nor face landmark
    // detection. Let's use custom face landmark detection to aid custom eye
    // gaze correction.
    customFaceDetection = true;
  }

  // Load custom libraries which may utilize TensorFlow and/or WASM.
  const requiredScripts = [].concat(
    customBackgroundBlur    ? 'background.js' : [],
    customEyeGazeCorrection ? 'eye-gaze.js'   : [],
    customFaceDetection     ? 'face.js'       : []
  );
  importScripts(...requiredScripts);

  const generator = new VideoTrackGenerator();
  parent.postMessage({videoTrack: generator.track}, [generator.track]);
  const {readable} = new MediaStreamTrackProcessor({track});
  const transformer = new TransformStream({
    async transform(frame, controller) {
      // Detect faces or retrieve detected faces.
      const detectedFaces =
        customFaceDetection
          ? await detectFaces(frame)
          : frame.detectedFaces;
      // Blur the background if needed.
      if (customBackgroundBlur) {
        const newFrame = await blurBackground(frame, detectedFaces);
        frame.close();
        frame = newFrame;
      }
      // Correct the eye gaze if needed.
      if (customEyeGazeCorrection && (detectedFaces || []).length > 0) {
        const newFrame = await correctEyeGaze(frame, detectedFaces);
        frame.close();
        frame = newFrame;
      }
      controller.enqueue(frame);
    }
  });
  await readable.pipeThrough(transformer).pipeTo(generator.writable);
};

@alvestrand
Copy link
Contributor

Waiting for an explainer, or possible move to WebCodecs (since it does frame mods).

@youennf
Copy link
Contributor

youennf commented May 19, 2022

We should probably work on the abstract attach-metadata-to-video-frame mechanism, then we could reuse this mechanism.

@riju
Copy link

riju commented May 24, 2022

@alvestrand @youennf : Here's an explainer we have been working on.

@youennf
Copy link
Contributor

youennf commented Jun 14, 2022

The explainer is pretty clear to me.
I am not sure what we do with explainers but I guess it should be reviewed by WG and we can discuss at this point whether to merge it.
Some comments on the explainer:

  1. I would be tempted to make the API surface as minimal as possible (What is the MVP?) and leave the rest to a dedicated 'future steps' section. For instance, maybe the MVP only needs faceDetectionMode constraint (not landmarks/numfaces/contourpoints constraints) with a reduced set of values ("none" and "presence"). I am not sure about the difference between presence and contour for instance, which is somehow distracting. Are FaceLandmark part of the MVP as well?
  2. The proposal is based on the VideoFrameMetadata construct, which is fine. We should try to finalise this discussion in WebCodecs.
  3. DetectedFace has a required id and required probability. I can see 'id' being useful, maybe probability should be optional.

<h3>{{VideoFrame}}</h3>
<pre class="idl"
>partial interface VideoFrame {
readonly attribute FrozenArray&lt;DetectedFace&gt;? detectedFaces;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on discussions in w3c/webcodecs#559, the direction might be to move to
partial dictionary VideoFrameMetadata

@alvestrand
Copy link
Contributor

Assumed to be superseded by #78

@alvestrand alvestrand closed this Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants