Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add face detection constraints and VideoFrameMetadata members #78

Merged
merged 1 commit into from
May 4, 2023

Conversation

ttoivone
Copy link
Contributor

@ttoivone ttoivone commented Nov 11, 2022

This PR supersedes previous PRs related to face detection (#57 , #48 ). It adds the constraints (and related settings and capabilities) and extends the recently introduced VideoFrameMetadata to have descriptions for faces in the frames.

The feedback has been taken into consideration, simplifying the API by removing most of the previously proposed constraints. Also the mesh-based facial description has been removed. Only those judged to be essential for good performance are left. An exception is face landmarks, which are already supported by some platforms and could be therefore immediately useful. Furthermore, HumanFace-term is used instead of more generic DetectedFace to anticipate future extensions of VideoFrameMetadata.

The PR consists of two commits. The first updates the explainer and the second updates the spec.


Preview | Diff

@ttoivone
Copy link
Contributor Author

Raised an issue on Mozilla's standard positions

@ttoivone
Copy link
Contributor Author

+@eehakkin
+@riju

@chrisn
Copy link
Member

chrisn commented Nov 15, 2022

Thanks! Responding having seen you post w3c/webcodecs#607, and speaking as BBC contributor, with my Media WG chair hat removed.

We need to be sure that there are no ethical issues exposing this to the web, concerns I mentioned at the TPAC 2021 breakout meeting: https://www.w3.org/2021/10/20-webrtc-ic-minutes.html

It's good that detecting facial expressions is stated as a non-goal, but I'd recommend going further to say it "must not" rather than "does not need to". Misdetection is a concern, as mentioned in the explainer, but also, there are privacy implications of exposing inferred emotions, at least without strong user consent.

As such I'd want to see this proposal go through wide review, including Privacy and TAG.

@ttoivone
Copy link
Contributor Author

I'd recommend going further to say it "must not" rather than "does not need to". Misdetection is a concern, as mentioned in the explainer, but also, there are privacy implications of exposing inferred emotions, at least without strong user consent.

I changed the wording in the explainer as you suggested and it will be updated in the next PR. However, while not having a problem updating the wording, I don't personally see this as an issue with the proposed API. Misdetection is an issue, but by not offering the detection in the Web API we just make people to run their custom detection algorithms which hardly improves the situation. I don't see any privacy issues here -- the metadata is inferred from the same frame where it is attached to, so it does not bring any new information to whoever gets the frame what the original frame alone wouldn't already have. Privacy issues would exist only if the metadata would be delivered to user without the related video frame, but that is not done by the proposed or other Web APIs.

@ttoivone
Copy link
Contributor Author

Changes in 5f8b11b:

  • Removed countour, replaced with with bounding-box and center-point
    for faces and landmarks, respectively
  • Replaced sequence of landmarks with members in HumanFace dict
  • Removed faceDetectionMaxContourPoints -- contour is now gone
  • Add a separate constraint to control landmark detection
  • Removed "nose" landmark to simplify -- not supported by Android/ChromeOS HAL3
  • Emphasized that facial expressions must be non-goals for the spec
  • Removed nullability from all members
  • Specified id more accurately
  • Once again proofread everything
  • More complete acknowledgements in the explainer (send me a note if you're missing)
  • Updated examples

@ttoivone
Copy link
Contributor Author

@youennf @jan-ivar Requesting review. I couldn't add reviewers myself for some reason.


The field `landmarks` provides a list of facial features belonging to the detected face, such as eyes or mouth. User agent may return accurate contour for the features, but early implementations are expected to deliver only a single point corresponding to the center of a feature.
dictionary HumanFaceLandmark {
Point2D centerPoint;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark it required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they would be initially required but in the future it could contain a bounding box as well. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I did not use required here because in future we might have also bounding box/contour or other fields which would make centerPoint unnecessary. I'm happy to mark it required for now unless it complicates amending the spec later with other type of location information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use required here. See rationale in other comment.

Copy link
Contributor Author

@ttoivone ttoivone Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changes needed to unblock issue 79, this doesn't look anymore feasible.

face-detection-explainer.md Outdated Show resolved Hide resolved
dictionary HumanFace {
long id;
float probability;
DOMRectReadOnly boundingBox;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make boundingBox required, or is there a case where we would know that there is a face in the image but we do not know where?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know see that we have a kind of union boudingBox or face landmark, so we cannot make it required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a kind of union boudingBox or face landmark

No, if with union you mean that only either one would be valid. The intent is to allow both face boundingBox and face landmarks simultaneously. The boundingBox within the HumanFace dict would indicate bounding box for the whole face. Landmarks are part of human face and they can be set or not set independently of the whole face bounding box.

The reason why I did not mark boundingBox as required is the same as for the centerPoint: to anticipate extensions to HumanFace where we could have something like contour or bitmask for the face and not need bounding box anymore. If you think it is not a problem to now mark boundingBox as required and later to remove the keyword when dictionary is extended with new types of location, I am happy to add the keyword now. However, I am worried that if the member is now required, Web apps could break if the keyword is removed later and bounding box made optional. Probably this wouldn't happen though, as humanFaceDetectionMode would need to be set to something else than bounding-box too.

is there a case where we would know that there is a face in the image but we do not know where

With this draft version, there wouldn't be such a case. However, it could be future extension: add a new ObjectDetectionMode called presence which just indicates that a face exists in the frame without its location. Use case could be for example to prevent screensaver while user is in the front of his/her computer. We did have the presence mode in some earlier draft, but I removed it to simplify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that if the member is now required, Web apps could break if the keyword is removed later and bounding box made optional.

The required WebIDL keyboard is a no-op on outputs, yet it's become a convention many specs lean on to indicate that user agents will never return a dictionary without filling in the member. It's still technically up to algorithms to specify what is actually filled in. But some specs leave algorithms to be inferred.

Sometimes though, like here, the invariants are more complicated, and a single required keyword won't suffice. The invariant seems to be:

if (track.getSettings().humanFaceLandmarkDetectionMode == "center-point") {
  for (const face of frame.metadata().humanFaces) {
    const x = face.leftEye?.centerPoint.x; // i.e. no need to write centerPoint?.x

Is that right?

Similarly, only tracks whose settings include humanFaceDetectionMode: "bounding-box" are guaranteed to have a face.boundingBox. These invariants seem worth documenting in prose or algorithms.

boundingBox cannot be marked as required because it would be absent if you set the former but not the latter constraint.

But I would mark centerPoint as required, because it's always a sub-member, e.g. leftEye wouldn't be there if not for humanFaceLandmarkDetectionMode == "center-point". We should be able to remove that particular required keyword later without issue when another humanFaceLandmarkDetectionMode mode is added, because the new access mode would need to be requested explicitly by JS (assuming browsers cannot default to any of these modes, which I don't see this spec speaking to actually).

Copy link
Contributor Author

@ttoivone ttoivone Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, and right, in the example there is no need to write centerPoint?.x because the metadata originates from a video track with humanFaceLandmarkDetectionMode == "center-point".

I agree with the suggestion to document the invariants. I changed the PR as follows:

bounding-box: This source generates metadata into {{VideoFrameMetadata}} related to human face detection including the bounding box information for each detected face. As an input, this is interpreted as a command to turn the setting of human face detection to bounding-box.

But I would mark centerPoint as required

After changes needed to unblock issue 79, this doesn't look anymore feasible.

face-detection-explainer.md Outdated Show resolved Hide resolved
partially (or in special cases even fully) outside of the visible image.
A coordinate is interpreted to represent a location in a normalized square space. The origin of
coordinates (x,y) = (0.0, 0.0) represents the upper leftmost corner whereas the (x,y) =
(1.0, 1.0) represents the lower rightmost corner relative to the rendered frame: the x-coordinate (columns) increases rightwards and the y-coordinate (rows) increases downwards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to normalise?
DOMRectReadOnly has a width/height and an origin which would not be normalised but Point2D would be?
What is doing https://wicg.github.io/shape-detection-api? OS APIs?

Copy link
Contributor Author

@ttoivone ttoivone Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion with the WebCodecs people (eg. Dan Sanders) and the issue of coordinate system turned to be a bit complicated. A normalized coordinate space seemed to be more straightforward and avoids some issues compared to natural size which doesn't always match with copyTo(). It is too bad that it doesn't seem to be consistent with the existing Web APIs. Shape detection API uses natural size (unlike we) but after the discussions it seems that the whole definition of coordinate space in that spec is not adequate, at least not for our purposes with VideoFrame which has internally a rotation, visibleRect, and codedRect.

Therefore, I would rather stick with the definition from Image Capture spec's Points of interest with normalized coordinates. I think that spec is more mainstream and avoids the above issues. (edit: actually PoI spec doesn't consider rotation nor visibleRect/codedRect either. Therefore we say in the spec/explainer relative to the rendered frame which should avoid the possible confusion.)

The platform APIs don't help here much, Windows uses normalized coordinates (up to INT_MAX or something) while Android/ChromeOS uses native pixel coordinates.

I think that both Point2D and DOMRectReadOnly types should use the same coordinate system for consistency. If we use normalized coordinates, also width and height should be normalized so that right = x + width for consistency. I will clarify that in the draft API once we agree on coordinate system.

@ttoivone ttoivone closed this Nov 24, 2022
@ttoivone ttoivone reopened this Nov 24, 2022
@ttoivone ttoivone requested review from youennf and removed request for jan-ivar November 29, 2022 13:19
@aboba
Copy link
Contributor

aboba commented Nov 29, 2022

Is the PR ready for CfC?

@chrisn
Copy link
Member

chrisn commented Nov 29, 2022

Thanks @ttoivone for updating the explainer, looks OK from my point of view.

@ttoivone
Copy link
Contributor Author

Is the PR ready for CfC?

We are still waiting for review comments from WebCodecs team (Dan Sanders/Dale Curtis).

@ttoivone ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch 2 times, most recently from 37840a4 to e2ec3d6 Compare December 1, 2022 13:56
@ttoivone
Copy link
Contributor Author

ttoivone commented Dec 1, 2022

Changes in e2ec3d6:

  • Split detection mode enum into two: HumanFaceDetectionMode and HumanFaceLandmarkDetectionMode according to feedback

Feedback was positive from WebCodecs (Dale Curtis) "Structure looks good to me for VideoFrameMetadata. I defer to @youennf around correctness issues for what metadata should be there." @jan-ivar was removed inadvertently from the reviewer list and I still can't add reviewers myself, sorry.

@youennf @jan-ivar: Please let us know if further updates are needed into the PR before CfC, thanks.

Copy link
Member

@jan-ivar jan-ivar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, thanks so much for all this work! Overall I think it looks good, just lots of nits and comments since it's taken me a while to dive in and review this. My apologies for that.

To summarize some of my feedback:

  • document invariants better around reading metadata, so web devs don't have to ?. every data point
  • clarify, maybe even simplify probability?
  • orient examples around detecting support using getSettings() not getSupportedConstraints()
  • define what left/right mean (eye of human or beholder?)
  • clarify if the two constraints are dependent or not (and why there are two)
  • remove gaze correction

Thanks! Happy to take changes after CfC has concluded.

index.html Outdated
@@ -654,5 +654,400 @@ <h2>Exposing change of MediaStreamTrack configuration</h2>
</p>
</div>
</section>
<section>
<h2>Human faces</h2>
<p>Human face metadata describes the human faces in video frames. It can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a reader coming in cold, I think it would be good to clarify early that this is detection, not recognition of faces. "describes the human faces" sounds like the latter to me, which seems unfortunate. How about:

Suggested change
<p>Human face metadata describes the human faces in video frames. It can
<p>Human face metadata describes the presence of human faces in video frames. It can

Copy link
Contributor Author

@ttoivone ttoivone Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "Human face metadata describes the geometrical information of human faces". I think that is more precise. "presence" sounds like boolean to me, I would understand it describing whether there are faces on the frame or not.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
Comment on lines 74 to 100
```js
partial dictionary MediaTrackSupportedConstraints {
boolean humanFaceDetectionMode = true;
boolean humanFaceLandmarkDetectionMode = true;
};

partial dictionary MediaTrackCapabilities {
sequence<DOMString> humanFaceDetectionMode;
sequence<DOMString> humanFaceLandmarkDetectionMode;
};

partial dictionary MediaTrackConstraintSet {
ConstrainDOMString humanFaceDetectionMode;
ConstrainDOMString humanFaceLandmarkDetectionMode;
};

partial dictionary MediaTrackSettings {
DOMString humanFaceDetectionMode;
DOMString humanFaceLandmarkDetectionMode;
};

enum HumanFaceDetectionMode {
"none", // Face or landmark detection is not needed
"bounding-box", // Bounding box of the detected object is returned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebIDL is for specs not explainers IMHO. Can this be replaced with JS snippets showing expected syntax?

if not then please replace js with idl on the first line to get the correct syntax highlighting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having WebIDL in explainers as long as we're not yet ready to land it in specs. It's a pain to have it both places.

Copy link
Contributor Author

@ttoivone ttoivone Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think JS snippets are better for the examples at the end of the explainer. For explaining things, I found JS to be inconvenient. How would an enum be written in Javascript, for instance? Therefore, replaced js with idl.


The constraint `faceDetectionMode` is used by applications to describe the level of facial data that they need. At the lowest enabled level, `presence` will return the sequence of `DetectedFace`, but the `contour` and `landmarks` sequences will be empty. When `faceDetectionMode` is `contour`, arbitrary number of points around the faces will be returned but no landmarks. An user agent might return only four contour points corresponding to face bounding box. If a Web application needs only maximum of four contour points (bounding box), it can set `faceDetectionMode` to `bounding-box` which limits number of contour points to four, residing at the corners of a rectangle around the detected faces.
New members are added to capabilities, constraints, and settings for Web applications to enable and control face and face landmark detection with `getUserMedia()` and `applyConstraints()` and to query capabilities of face detection with `getCapabilities()` methods. Web applications should not ask more facial metadata than what they need to limit computation. For example, if an applications is content with just a face bounding box, it should set the constraint `humanFaceLandmarkDetectionMode` to `"none"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "Web applications should not ask more facial metadata than what they need to limit computation" has a typo, but also seems to be admonishing web developers to behave a certain way, which reads a bit odd to me.

Could it be rephrased to emphasize the benefits of asking for less information?

Copy link
Contributor Author

@ttoivone ttoivone Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased to "Web applications can reduce the computational load on the user agent by only requesting the necessary amount of facial data, rather than asking for more than they need."


At the highest level, when `faceDetectionMode` is `landmarks`, the full precision contour which is available is returned along with landmark features.
The enumeration constraints `humanFaceDetectionMode` and `humanFaceLandmarkDetectionMode` set the level of detection needed for human faces and their landmarks, respectively. These settings can be one of the enumeration values in `HumanFaceDetectionMode` and `HumanFaceLandmarkDetectionMode`. When `humanFaceDetectionMode` is `"bounding-box"`, user agent must attempt face detection and set the metadata in video frames correspondingly. When the setting is `"none"`, face description metadata (including landmarks) is not set. Similarly, when `humanFaceLandmarkDetectionMode` is `"none"`, the landmarks (ie. members `leftEye`, `rightEye`, and `mouth` in dictionary `HumanFace`) are not set. When the setting is `"center-point"` and face detection is enabled, the user agent must attempt to detect face landmarks and set the location information in the members of type `HumanFaceLandmark` accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"enumeration constraints" is an odd term. They are DOMString constraints. Maybe just say "constraints"?

Also to nitpick on language, these constraints are "applied" not "set". And even when successfully set, it is probably wise for the application to read back videoTrack.getSettings() to verify that the constraints actually stuck, as I believe that is the only point at which user agents MUST perform the feature (e.g. a user COULD have configured their user agent to not expose this functionality, even though the constraint name is recognized by getSupportedConstraints).

Copy link
Contributor Author

@ttoivone ttoivone Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "enumeration", rephrased the paragraph. It now begins with "The constraints humanFaceDetectionMode and humanFaceLandmarkDetectionMode request the metadata detail level that the application wants for human faces and their landmarks, respectively."

index.html Outdated
frames originating from the same {{MediaStreamTrack}} source, {{id}} is set to the same integer value
for the face in all frames.</p>
<p>User agent MUST NOT select the assigned value of {{id}} in such a way that the detected faces could
be correlated to match in any way between different {{MediaStreamTrack}} objects.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that a const trackB = trackA.clone() will have different ids. Was that the intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the main rule and I don't see a reason to make an exception in this case. Do you think there would be a reason to do otherwise?

Copy link
Contributor Author

@ttoivone ttoivone Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that actually elsewhere in the PR notes of different {{MediaStreamTrack}} sources are made, contradicting with this sentence. Therefore, changing this sentence to: "be correlated to match between different {{MediaStreamTrack}} sources."

index.html Outdated Show resolved Hide resolved
The field `contour` provides an arbitrary number of points which enclose the face. Sophisticated implementations could provide a large number of points but initial implementations are expected to provide four contour points located in the corners of a rectangular region which describes the face bounding box. User agent is allowed to provide a minimum of one point, in which case the point should be located at the center of the face.
dictionary HumanFace {
long id;
float probability;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably better expressed as a "confidence".

It's not really a probability: the face is either there or not. This value is intended to represent how confident the API is that the face is there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that here "probability" is the estimated probability provided by the model, based on the data it was trained on.

Confidence is typically not expressed as a single number, but rather a range.

Copy link
Member

@jan-ivar jan-ivar Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that here "probability" is the estimated probability provided by the model,

Naming aside, leaving the range of possible values implementation-defined may make it hard for apps to reason about the values. E.g. if one implementation's model rarely ever outputs above 0.7 for some technical purity reason, even in a well-lit face front situation, and another model reliably produces 1.0 except in corner cases, it could lead to weird web compat issues.

Are there precedents we could look at? Would it make sense to say something like 1.0 SHOULD represent the highest value the model can consistently output?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think confidence is the right term.

Each confidence estimate might be seen as a probability of correctness, reflect-
ing the model’s uncertainty about a detection or the pixel mask. During inference,
we expect the estimated confidence to match the observed precision for a predic-
tion. For example, given 100 predictions with 80% confidence each, we expect 80
predictions to be correctly predicted
https://arxiv.org/pdf/2202.12785

That very paper describes how to ensure proper calibration of this confidence for object detection; I'm not sure whether we want to go to that level of specificity though.

Copy link
Contributor Author

@ttoivone ttoivone Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving the range of possible values implementation-defined may make it hard for apps to reason about the values.

I agree with this 100% and think that unless the value is properly defined it shouldn't be included into the spec.

The paper that @dontcallmedom quotes says "confidence estimate might be seen as a probability of correctness". To me this doesn't look like very rigorous or widely used definition. I'd rather go to commonly used definitions, such as Wikipedia. I believe what we would like to have here is conditional probability, but the condition needs to be more precisely defined than what it used to be. In the latest explainer/PR I changed the definition as follows:

"the estimate of the conditional probability that the segmented object actually is of the type indicated by the {{Segment/type}} member on the condition that the detection has been made." (where {{Segment/type}} describes whether it is a face or some other type of object; see issue 79.

It is an estimate, not exact probability. As an implementor advise, the implementor could run the face detection algorithm on some test videos and of the detections that the algorithm made, check how many are correct (missed faces would not affect the conditional probability here although the algorithm quality affects that too). Other factors could also be taken into account, such as confidence/score from the algorithm, camera image quality, etc.

@ttoivone ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch from e2ec3d6 to 67c9dba Compare March 9, 2023 10:45
@ttoivone
Copy link
Contributor Author

ttoivone commented Mar 9, 2023

@dontcallmedom

Updated the PR. All previous comments should have been now addressed either by changing the PR or otherwise. Asking reviewers to check if this version could be merged or if more changes are needed.

@jan-ivar
@alvestrand
@chrisn
@martinthomson
@youennf

In particular, after the CfC, three objections were made:

Segmentation metadata #79
Scope of Applicability #84
Variance of Results #85

As per the Feb 21 meeting, proposal was to mark issues 84 and 85 as non blocking. This updated PR should now address issue 79 which was a blocker.

Asking @adoba to mark issues 84 and 85 as non-blockers and checking if this PR now unblocks issue 79.

@ttoivone ttoivone removed the request for review from youennf March 9, 2023 11:07
@ttoivone ttoivone requested review from martinthomson, youennf, jan-ivar and aboba and removed request for chrisn, martinthomson, youennf, jan-ivar and aboba March 9, 2023 11:08
@youennf
Copy link
Contributor

youennf commented Mar 17, 2023

Given @ttoivone comment, I think we should review the PR at next editor's meeting.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@ttoivone
Copy link
Contributor Author

ttoivone commented Apr 6, 2023

Changes in the latest update of the PR:

  • Removed a constraint, only one remaining
  • partOf no more required
  • Explicitly specify that cloned MediaStreamTracks should have the same ids
  • Many typo fixes and refactoring the text

index.html Outdated Show resolved Hide resolved
@jan-ivar
Copy link
Member

jan-ivar commented Apr 6, 2023

Editors agreed to merge with the change above.

Add specifications for human face metadata and related constraints,
capabilities, and settings. Add also corresponding examples.
@ttoivone ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch from 311cd1c to a1c636a Compare April 6, 2023 16:04
@youennf youennf merged commit d917fec into w3c:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants