Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Face Detection: Segmentation metadata #79

Open
aboba opened this issue Jan 17, 2023 · 5 comments
Open

Face Detection: Segmentation metadata #79

aboba opened this issue Jan 17, 2023 · 5 comments

Comments

@aboba
Copy link
Contributor

aboba commented Jan 17, 2023

FaceDetection metadata is one example of VideoFrame segmentation, which is useful for:

  • Background blur: Keep the area defined as "foreground" sharp, blur areas outside the foreground
  • Segmentation: The encoder can allocate more bits to the "foreground", less to background areas of the frame.

It occurs to me that rather than defining FaceDetection metadata, we might instead define Segmentation metadata, with a type field of "face detection".

@aboba aboba changed the title Segmentation metadata Face Detection: Segmentation metadata Jan 29, 2023
@steely-glint
Copy link

Whilst at an ML level Face Detection is 'just another segmentation problem', from the user's point of view it is somewhat more personal than detecting an orange - especially since the bulk use-case of WebRTC is video conferences and good background blur is an egalitarian feature.
I think that some exceptionalism for this use case is justified.

@youennf
Copy link
Contributor

youennf commented Jan 30, 2023

@aboba, can you clarify whether this issue is a blocker for the CFC?
AIUI, your suggestion seems like a request for API change, not a blocker for the API.

@aboba
Copy link
Contributor Author

aboba commented Jan 30, 2023

It's a request for a metadata change, so that we don't have to define metadata for segmentation in addition to metadata specific to face detection. If the encoder wants to utilize segmentation information to figure out where to spend its effort, it shouldn't have to be able to understand multiple metadata formats, each optimized for a particular use.

@ttoivone
Copy link
Contributor

ttoivone commented Jan 31, 2023

@aboba I agree that it would be better to define a generic segmentation metadata. We're happy to change the spec proposal once agreed on directions. What do you think of this:

partial dictionary VideoFrameMetadata {
  sequence<Segment> segment;
};

dictionary Segment {
  DOMString          type;         // One of enum SegmentType
  long               id;
  long               partOf;       // References the parent segment id
  float              probability;  // or confidence
  Point2D?           centerPoint;
  DOMRectReadOnly?   boundingBox;
//  sequence<Point2D>? contour;    // Possible future extension
};

enum SegmentType {
  "human-face",
  "left-eye",
  "right-eye",
  "mouth",
 // To be extended later with other types of segments
};

@dontcallmedom-bot
Copy link

This issue was mentioned in WEBRTCWG-2023-02-21 (Page 44)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants