Add face detection constraints and VideoFrameMetadata members #78

ttoivone · 2022-11-11T16:11:55Z

This PR supersedes previous PRs related to face detection (#57 , #48 ). It adds the constraints (and related settings and capabilities) and extends the recently introduced VideoFrameMetadata to have descriptions for faces in the frames.

The feedback has been taken into consideration, simplifying the API by removing most of the previously proposed constraints. Also the mesh-based facial description has been removed. Only those judged to be essential for good performance are left. An exception is face landmarks, which are already supported by some platforms and could be therefore immediately useful. Furthermore, HumanFace-term is used instead of more generic DetectedFace to anticipate future extensions of VideoFrameMetadata.

The PR consists of two commits. The first updates the explainer and the second updates the spec.

Preview | Diff

ttoivone · 2022-11-15T10:47:21Z

Raised an issue on Mozilla's standard positions

ttoivone · 2022-11-15T11:13:50Z

+@eehakkin
+@riju

chrisn · 2022-11-15T17:59:45Z

Thanks! Responding having seen you post w3c/webcodecs#607, and speaking as BBC contributor, with my Media WG chair hat removed.

We need to be sure that there are no ethical issues exposing this to the web, concerns I mentioned at the TPAC 2021 breakout meeting: https://www.w3.org/2021/10/20-webrtc-ic-minutes.html

It's good that detecting facial expressions is stated as a non-goal, but I'd recommend going further to say it "must not" rather than "does not need to". Misdetection is a concern, as mentioned in the explainer, but also, there are privacy implications of exposing inferred emotions, at least without strong user consent.

As such I'd want to see this proposal go through wide review, including Privacy and TAG.

ttoivone · 2022-11-22T09:21:12Z

I'd recommend going further to say it "must not" rather than "does not need to". Misdetection is a concern, as mentioned in the explainer, but also, there are privacy implications of exposing inferred emotions, at least without strong user consent.

I changed the wording in the explainer as you suggested and it will be updated in the next PR. However, while not having a problem updating the wording, I don't personally see this as an issue with the proposed API. Misdetection is an issue, but by not offering the detection in the Web API we just make people to run their custom detection algorithms which hardly improves the situation. I don't see any privacy issues here -- the metadata is inferred from the same frame where it is attached to, so it does not bring any new information to whoever gets the frame what the original frame alone wouldn't already have. Privacy issues would exist only if the metadata would be delivered to user without the related video frame, but that is not done by the proposed or other Web APIs.

ttoivone · 2022-11-22T12:05:38Z

Changes in 5f8b11b:

Removed countour, replaced with with bounding-box and center-point
for faces and landmarks, respectively
Replaced sequence of landmarks with members in HumanFace dict
Removed faceDetectionMaxContourPoints -- contour is now gone
Add a separate constraint to control landmark detection
Removed "nose" landmark to simplify -- not supported by Android/ChromeOS HAL3
Emphasized that facial expressions must be non-goals for the spec
Removed nullability from all members
Specified id more accurately
Once again proofread everything
More complete acknowledgements in the explainer (send me a note if you're missing)
Updated examples

ttoivone · 2022-11-22T20:17:32Z

@youennf @jan-ivar Requesting review. I couldn't add reviewers myself for some reason.

youennf · 2022-11-23T09:29:06Z

face-detection-explainer.md


-The field `landmarks` provides a list of facial features belonging to the detected face, such as eyes or mouth. User agent may return accurate contour for the features, but early implementations are expected to deliver only a single point corresponding to the center of a feature.
+dictionary HumanFaceLandmark {
+  Point2D           centerPoint;


Can we mark it required?

I guess they would be initially required but in the future it could contain a bounding box as well. Is that correct?

Correct. I did not use required here because in future we might have also bounding box/contour or other fields which would make centerPoint unnecessary. I'm happy to mark it required for now unless it complicates amending the spec later with other type of location information.

I think we should use required here. See rationale in other comment.

After changes needed to unblock issue 79, this doesn't look anymore feasible.

face-detection-explainer.md

youennf · 2022-11-23T09:30:41Z

face-detection-explainer.md

+dictionary HumanFace {
+  long              id;
+  float             probability;
+  DOMRectReadOnly   boundingBox;


Can we make boundingBox required, or is there a case where we would know that there is a face in the image but we do not know where?

I know see that we have a kind of union boudingBox or face landmark, so we cannot make it required.

we have a kind of union boudingBox or face landmark

No, if with union you mean that only either one would be valid. The intent is to allow both face boundingBox and face landmarks simultaneously. The boundingBox within the HumanFace dict would indicate bounding box for the whole face. Landmarks are part of human face and they can be set or not set independently of the whole face bounding box.

The reason why I did not mark boundingBox as required is the same as for the centerPoint: to anticipate extensions to HumanFace where we could have something like contour or bitmask for the face and not need bounding box anymore. If you think it is not a problem to now mark boundingBox as required and later to remove the keyword when dictionary is extended with new types of location, I am happy to add the keyword now. However, I am worried that if the member is now required, Web apps could break if the keyword is removed later and bounding box made optional. Probably this wouldn't happen though, as humanFaceDetectionMode would need to be set to something else than bounding-box too.

is there a case where we would know that there is a face in the image but we do not know where

With this draft version, there wouldn't be such a case. However, it could be future extension: add a new ObjectDetectionMode called presence which just indicates that a face exists in the frame without its location. Use case could be for example to prevent screensaver while user is in the front of his/her computer. We did have the presence mode in some earlier draft, but I removed it to simplify.

I am worried that if the member is now required, Web apps could break if the keyword is removed later and bounding box made optional.

The required WebIDL keyboard is a no-op on outputs, yet it's become a convention many specs lean on to indicate that user agents will never return a dictionary without filling in the member. It's still technically up to algorithms to specify what is actually filled in. But some specs leave algorithms to be inferred.

Sometimes though, like here, the invariants are more complicated, and a single required keyword won't suffice. The invariant seems to be:

if (track.getSettings().humanFaceLandmarkDetectionMode == "center-point") { for (const face of frame.metadata().humanFaces) { const x = face.leftEye?.centerPoint.x; // i.e. no need to write centerPoint?.x

Is that right?

Similarly, only tracks whose settings include humanFaceDetectionMode: "bounding-box" are guaranteed to have a face.boundingBox. These invariants seem worth documenting in prose or algorithms.

boundingBox cannot be marked as required because it would be absent if you set the former but not the latter constraint.

But I would mark centerPoint as required, because it's always a sub-member, e.g. leftEye wouldn't be there if not for humanFaceLandmarkDetectionMode == "center-point". We should be able to remove that particular required keyword later without issue when another humanFaceLandmarkDetectionMode mode is added, because the new access mode would need to be requested explicitly by JS (assuming browsers cannot default to any of these modes, which I don't see this spec speaking to actually).

Understood, and right, in the example there is no need to write centerPoint?.x because the metadata originates from a video track with humanFaceLandmarkDetectionMode == "center-point".

I agree with the suggestion to document the invariants. I changed the PR as follows:

bounding-box: This source generates metadata into {{VideoFrameMetadata}} related to human face detection including the bounding box information for each detected face. As an input, this is interpreted as a command to turn the setting of human face detection to bounding-box.

But I would mark centerPoint as required

After changes needed to unblock issue 79, this doesn't look anymore feasible.

face-detection-explainer.md

youennf · 2022-11-23T09:38:39Z

face-detection-explainer.md

+partially (or in special cases even fully) outside of the visible image.
+A coordinate is interpreted to represent a location in a normalized square space. The origin of
+coordinates (x,y) = (0.0, 0.0) represents the upper leftmost corner whereas the (x,y) =
+(1.0, 1.0) represents the lower rightmost corner relative to the rendered frame: the x-coordinate (columns) increases rightwards and the y-coordinate (rows) increases downwards.


Does it make sense to normalise?
DOMRectReadOnly has a width/height and an origin which would not be normalised but Point2D would be?
What is doing https://wicg.github.io/shape-detection-api? OS APIs?

We had a discussion with the WebCodecs people (eg. Dan Sanders) and the issue of coordinate system turned to be a bit complicated. A normalized coordinate space seemed to be more straightforward and avoids some issues compared to natural size which doesn't always match with copyTo(). It is too bad that it doesn't seem to be consistent with the existing Web APIs. Shape detection API uses natural size (unlike we) but after the discussions it seems that the whole definition of coordinate space in that spec is not adequate, at least not for our purposes with VideoFrame which has internally a rotation, visibleRect, and codedRect.

Therefore, I would rather stick with the definition from Image Capture spec's Points of interest with normalized coordinates. I think that spec is more mainstream and avoids the above issues. (edit: actually PoI spec doesn't consider rotation nor visibleRect/codedRect either. Therefore we say in the spec/explainer relative to the rendered frame which should avoid the possible confusion.)

The platform APIs don't help here much, Windows uses normalized coordinates (up to INT_MAX or something) while Android/ChromeOS uses native pixel coordinates.

I think that both Point2D and DOMRectReadOnly types should use the same coordinate system for consistency. If we use normalized coordinates, also width and height should be normalized so that right = x + width for consistency. I will clarify that in the draft API once we agree on coordinate system.

aboba · 2022-11-29T16:34:11Z

Is the PR ready for CfC?

chrisn · 2022-11-29T19:15:11Z

Thanks @ttoivone for updating the explainer, looks OK from my point of view.

ttoivone · 2022-11-29T19:53:57Z

Is the PR ready for CfC?

We are still waiting for review comments from WebCodecs team (Dan Sanders/Dale Curtis).

ttoivone · 2022-12-01T14:10:23Z

Changes in e2ec3d6:

Split detection mode enum into two: HumanFaceDetectionMode and HumanFaceLandmarkDetectionMode according to feedback

Feedback was positive from WebCodecs (Dale Curtis) "Structure looks good to me for VideoFrameMetadata. I defer to @youennf around correctness issues for what metadata should be there." @jan-ivar was removed inadvertently from the reviewer list and I still can't add reviewers myself, sorry.

@youennf @jan-ivar: Please let us know if further updates are needed into the PR before CfC, thanks.

jan-ivar

Hi all, thanks so much for all this work! Overall I think it looks good, just lots of nits and comments since it's taken me a while to dive in and review this. My apologies for that.

To summarize some of my feedback:

document invariants better around reading metadata, so web devs don't have to ?. every data point
clarify, maybe even simplify probability?
orient examples around detecting support using getSettings() not getSupportedConstraints()
define what left/right mean (eye of human or beholder?)
clarify if the two constraints are dependent or not (and why there are two)
remove gaze correction

Thanks! Happy to take changes after CfC has concluded.

jan-ivar · 2023-01-10T15:38:26Z

index.html

@@ -654,5 +654,400 @@ <h2>Exposing change of MediaStreamTrack configuration</h2>
    </p>
    </div>
  </section>
+  <section>
+    <h2>Human faces</h2>
+    <p>Human face metadata describes the human faces in video frames. It can


For a reader coming in cold, I think it would be good to clarify early that this is detection, not recognition of faces. "describes the human faces" sounds like the latter to me, which seems unfortunate. How about:

Suggested change

<p>Human face metadata describes the human faces in video frames. It can

<p>Human face metadata describes the presence of human faces in video frames. It can

Changed to "Human face metadata describes the geometrical information of human faces". I think that is more precise. "presence" sounds like boolean to me, I would understand it describing whether there are faces on the frame or not.

index.html

jan-ivar · 2023-01-11T00:22:16Z

face-detection-explainer.md

+```js
+partial dictionary MediaTrackSupportedConstraints {
+  boolean humanFaceDetectionMode = true;
+  boolean humanFaceLandmarkDetectionMode = true;
+};
+
+partial dictionary MediaTrackCapabilities {
+  sequence<DOMString> humanFaceDetectionMode;
+  sequence<DOMString> humanFaceLandmarkDetectionMode;
+};
+
+partial dictionary MediaTrackConstraintSet {
+  ConstrainDOMString humanFaceDetectionMode;
+  ConstrainDOMString humanFaceLandmarkDetectionMode;
+};
+
+partial dictionary MediaTrackSettings {
+  DOMString humanFaceDetectionMode;
+  DOMString humanFaceLandmarkDetectionMode;
+};
+
+enum HumanFaceDetectionMode {
+  "none",          // Face or landmark detection is not needed
+  "bounding-box",  // Bounding box of the detected object is returned


WebIDL is for specs not explainers IMHO. Can this be replaced with JS snippets showing expected syntax?

if not then please replace js with idl on the first line to get the correct syntax highlighting.

I like having WebIDL in explainers as long as we're not yet ready to land it in specs. It's a pain to have it both places.

I think JS snippets are better for the examples at the end of the explainer. For explaining things, I found JS to be inconvenient. How would an enum be written in Javascript, for instance? Therefore, replaced js with idl.

jan-ivar · 2023-01-11T00:28:25Z

face-detection-explainer.md


-The constraint `faceDetectionMode` is used by applications to describe the level of facial data that they need. At the lowest enabled level, `presence` will return the sequence of `DetectedFace`, but the `contour` and `landmarks` sequences will be empty. When `faceDetectionMode` is `contour`, arbitrary number of points around the faces will be returned but no landmarks. An user agent might return only four contour points corresponding to face bounding box. If a Web application needs only maximum of four contour points (bounding box), it can set `faceDetectionMode` to `bounding-box` which limits number of contour points to four, residing at the corners of a rectangle around the detected faces.
+New members are added to capabilities, constraints, and settings for Web applications to enable and control face and face landmark detection with `getUserMedia()` and `applyConstraints()` and to query capabilities of face detection with `getCapabilities()` methods. Web applications should not ask more facial metadata than what they need to limit computation. For example, if an applications is content with just a face bounding box, it should set the constraint `humanFaceLandmarkDetectionMode` to `"none"`.


The sentence "Web applications should not ask more facial metadata than what they need to limit computation" has a typo, but also seems to be admonishing web developers to behave a certain way, which reads a bit odd to me.

Could it be rephrased to emphasize the benefits of asking for less information?

Rephrased to "Web applications can reduce the computational load on the user agent by only requesting the necessary amount of facial data, rather than asking for more than they need."

jan-ivar · 2023-01-11T00:32:16Z

face-detection-explainer.md


-At the highest level, when `faceDetectionMode` is `landmarks`, the full precision contour which is available is returned along with landmark features.
+The enumeration constraints `humanFaceDetectionMode` and `humanFaceLandmarkDetectionMode` set the level of detection needed for human faces and their landmarks, respectively. These settings can be one of the enumeration values in `HumanFaceDetectionMode` and `HumanFaceLandmarkDetectionMode`. When `humanFaceDetectionMode` is `"bounding-box"`, user agent must attempt face detection and set the metadata in video frames correspondingly. When the setting is `"none"`, face description metadata (including landmarks) is not set. Similarly, when `humanFaceLandmarkDetectionMode` is `"none"`, the landmarks (ie. members `leftEye`, `rightEye`, and `mouth` in dictionary `HumanFace`) are not set. When the setting is `"center-point"` and face detection is enabled, the user agent must attempt to detect face landmarks and set the location information in the members of type `HumanFaceLandmark` accordingly. 


"enumeration constraints" is an odd term. They are DOMString constraints. Maybe just say "constraints"?

Also to nitpick on language, these constraints are "applied" not "set". And even when successfully set, it is probably wise for the application to read back videoTrack.getSettings() to verify that the constraints actually stuck, as I believe that is the only point at which user agents MUST perform the feature (e.g. a user COULD have configured their user agent to not expose this functionality, even though the constraint name is recognized by getSupportedConstraints).

Removed "enumeration", rephrased the paragraph. It now begins with "The constraints humanFaceDetectionMode and humanFaceLandmarkDetectionMode request the metadata detail level that the application wants for human faces and their landmarks, respectively."

jan-ivar · 2023-01-11T00:48:07Z

index.html

+            frames originating from the same {{MediaStreamTrack}} source, {{id}} is set to the same integer value
+            for the face in all frames.</p>
+            <p>User agent MUST NOT select the assigned value of {{id}} in such a way that the detected faces could
+            be correlated to match in any way between different {{MediaStreamTrack}} objects.</p>


This means that a const trackB = trackA.clone() will have different ids. Was that the intent?

Yes, it is the main rule and I don't see a reason to make an exception in this case. Do you think there would be a reason to do otherwise?

I noticed that actually elsewhere in the PR notes of different {{MediaStreamTrack}} sources are made, contradicting with this sentence. Therefore, changing this sentence to: "be correlated to match between different {{MediaStreamTrack}} sources."

index.html

martinthomson · 2023-01-11T01:47:49Z

face-detection-explainer.md

-The field `contour` provides an arbitrary number of points which enclose the face. Sophisticated implementations could provide a large number of points but initial implementations are expected to provide four contour points located in the corners of a rectangular region which describes the face bounding box. User agent is allowed to provide a minimum of one point, in which case the point should be located at the center of the face.
+dictionary HumanFace {
+  long              id;
+  float             probability;


This is probably better expressed as a "confidence".

It's not really a probability: the face is either there or not. This value is intended to represent how confident the API is that the face is there.

I assume that here "probability" is the estimated probability provided by the model, based on the data it was trained on.

Confidence is typically not expressed as a single number, but rather a range.

I assume that here "probability" is the estimated probability provided by the model,

Naming aside, leaving the range of possible values implementation-defined may make it hard for apps to reason about the values. E.g. if one implementation's model rarely ever outputs above 0.7 for some technical purity reason, even in a well-lit face front situation, and another model reliably produces 1.0 except in corner cases, it could lead to weird web compat issues.

Are there precedents we could look at? Would it make sense to say something like 1.0 SHOULD represent the highest value the model can consistently output?

I think confidence is the right term.

Each confidence estimate might be seen as a probability of correctness, reflect-
ing the model’s uncertainty about a detection or the pixel mask. During inference,
we expect the estimated confidence to match the observed precision for a predic-
tion. For example, given 100 predictions with 80% confidence each, we expect 80
predictions to be correctly predicted
https://arxiv.org/pdf/2202.12785

That very paper describes how to ensure proper calibration of this confidence for object detection; I'm not sure whether we want to go to that level of specificity though.

leaving the range of possible values implementation-defined may make it hard for apps to reason about the values.

I agree with this 100% and think that unless the value is properly defined it shouldn't be included into the spec.

The paper that @dontcallmedom quotes says "confidence estimate might be seen as a probability of correctness". To me this doesn't look like very rigorous or widely used definition. I'd rather go to commonly used definitions, such as Wikipedia. I believe what we would like to have here is conditional probability, but the condition needs to be more precisely defined than what it used to be. In the latest explainer/PR I changed the definition as follows:

"the estimate of the conditional probability that the segmented object actually is of the type indicated by the {{Segment/type}} member on the condition that the detection has been made." (where {{Segment/type}} describes whether it is a face or some other type of object; see issue 79.

It is an estimate, not exact probability. As an implementor advise, the implementor could run the face detection algorithm on some test videos and of the detections that the algorithm made, check how many are correct (missed faces would not affect the conditional probability here although the algorithm quality affects that too). Other factors could also be taken into account, such as confidence/score from the algorithm, camera image quality, etc.

ttoivone · 2023-03-09T11:06:34Z

@dontcallmedom

Updated the PR. All previous comments should have been now addressed either by changing the PR or otherwise. Asking reviewers to check if this version could be merged or if more changes are needed.

@jan-ivar
@alvestrand
@chrisn
@martinthomson
@youennf

In particular, after the CfC, three objections were made:

Segmentation metadata #79
Scope of Applicability #84
Variance of Results #85

As per the Feb 21 meeting, proposal was to mark issues 84 and 85 as non blocking. This updated PR should now address issue 79 which was a blocker.

Asking @adoba to mark issues 84 and 85 as non-blockers and checking if this PR now unblocks issue 79.

youennf · 2023-03-17T07:32:25Z

Given @ttoivone comment, I think we should review the PR at next editor's meeting.

index.html

ttoivone · 2023-04-06T12:28:03Z

Changes in the latest update of the PR:

Removed a constraint, only one remaining
partOf no more required
Explicitly specify that cloned MediaStreamTracks should have the same ids
Many typo fixes and refactoring the text

index.html

jan-ivar · 2023-04-06T14:22:41Z

Editors agreed to merge with the change above.

Add specifications for human face metadata and related constraints, capabilities, and settings. Add also corresponding examples.

ttoivone mentioned this pull request Nov 15, 2022

Face Detection MediaStreamTrack constraints and VideoFrame metadata mozilla/standards-positions#706

Open

ttoivone mentioned this pull request Nov 15, 2022

Announcement: Human face metadata entry to WebCodecs VideoFrame Metadata Registry w3c/webcodecs#607

Closed

ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch from ff3c99b to 5f8b11b Compare November 22, 2022 11:50

dontcallmedom requested review from jan-ivar and youennf November 23, 2022 08:30

youennf reviewed Nov 23, 2022

View reviewed changes

ttoivone closed this Nov 24, 2022

ttoivone reopened this Nov 24, 2022

ttoivone requested review from youennf and removed request for jan-ivar November 29, 2022 13:19

This was referenced Nov 29, 2022

Add face detection constraints and VideoFrame attributes #48

Closed

Face detection, background blur and eye gaze correction example #57

Closed

ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch 2 times, most recently from 37840a4 to e2ec3d6 Compare December 1, 2022 13:56

dontcallmedom requested a review from jan-ivar December 1, 2022 14:15

jan-ivar requested changes Jan 11, 2023

View reviewed changes

martinthomson reviewed Jan 11, 2023

View reviewed changes

ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch from e2ec3d6 to 67c9dba Compare March 9, 2023 10:45

ttoivone removed the request for review from youennf March 9, 2023 11:07

ttoivone requested review from martinthomson, youennf, jan-ivar and aboba and removed request for chrisn, martinthomson, youennf, jan-ivar and aboba March 9, 2023 11:08

youennf reviewed Mar 30, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

youennf reviewed Mar 30, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

youennf reviewed Mar 30, 2023

View reviewed changes

index.html Show resolved Hide resolved

youennf reviewed Mar 30, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

ttoivone mentioned this pull request Apr 6, 2023

Add Face Detection Explainer #69

Merged

ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch from 67c9dba to 311cd1c Compare April 6, 2023 10:18

youennf approved these changes Apr 6, 2023

View reviewed changes

jan-ivar reviewed Apr 6, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

jan-ivar added the Editors can integrate label Apr 6, 2023

Face detection: add specification

a1c636a

Add specifications for human face metadata and related constraints, capabilities, and settings. Add also corresponding examples.

ttoivone force-pushed the ttoivone-20221111-fdmd-flat branch from 311cd1c to a1c636a Compare April 6, 2023 16:04

jan-ivar approved these changes May 4, 2023

View reviewed changes

youennf merged commit d917fec into w3c:main May 4, 2023

	<p>Human face metadata describes the human faces in video frames. It can
	<p>Human face metadata describes the presence of human faces in video frames. It can


		The constraint `faceDetectionMode` is used by applications to describe the level of facial data that they need. At the lowest enabled level, `presence` will return the sequence of `DetectedFace`, but the `contour` and `landmarks` sequences will be empty. When `faceDetectionMode` is `contour`, arbitrary number of points around the faces will be returned but no landmarks. An user agent might return only four contour points corresponding to face bounding box. If a Web application needs only maximum of four contour points (bounding box), it can set `faceDetectionMode` to `bounding-box` which limits number of contour points to four, residing at the corners of a rectangle around the detected faces.
		New members are added to capabilities, constraints, and settings for Web applications to enable and control face and face landmark detection with `getUserMedia()` and `applyConstraints()` and to query capabilities of face detection with `getCapabilities()` methods. Web applications should not ask more facial metadata than what they need to limit computation. For example, if an applications is content with just a face bounding box, it should set the constraint `humanFaceLandmarkDetectionMode` to `"none"`.


		At the highest level, when `faceDetectionMode` is `landmarks`, the full precision contour which is available is returned along with landmark features.
		The enumeration constraints `humanFaceDetectionMode` and `humanFaceLandmarkDetectionMode` set the level of detection needed for human faces and their landmarks, respectively. These settings can be one of the enumeration values in `HumanFaceDetectionMode` and `HumanFaceLandmarkDetectionMode`. When `humanFaceDetectionMode` is `"bounding-box"`, user agent must attempt face detection and set the metadata in video frames correspondingly. When the setting is `"none"`, face description metadata (including landmarks) is not set. Similarly, when `humanFaceLandmarkDetectionMode` is `"none"`, the landmarks (ie. members `leftEye`, `rightEye`, and `mouth` in dictionary `HumanFace`) are not set. When the setting is `"center-point"` and face detection is enabled, the user agent must attempt to detect face landmarks and set the location information in the members of type `HumanFaceLandmark` accordingly.

Add face detection constraints and VideoFrameMetadata members #78

Add face detection constraints and VideoFrameMetadata members #78

Conversation

ttoivone commented Nov 11, 2022 • edited by pr-preview bot Loading

ttoivone commented Nov 15, 2022

ttoivone commented Nov 15, 2022

chrisn commented Nov 15, 2022

ttoivone commented Nov 22, 2022

ttoivone commented Nov 22, 2022

ttoivone commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Nov 24, 2022 • edited Loading

Choose a reason for hiding this comment

aboba commented Nov 29, 2022

chrisn commented Nov 29, 2022

ttoivone commented Nov 29, 2022

ttoivone commented Dec 1, 2022

jan-ivar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-ivar Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttoivone Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

ttoivone commented Mar 9, 2023

youennf commented Mar 17, 2023

ttoivone commented Apr 6, 2023

jan-ivar commented Apr 6, 2023

ttoivone commented Nov 11, 2022 •

edited by pr-preview bot

Loading

ttoivone Mar 8, 2023 •

edited

Loading

ttoivone Mar 8, 2023 •

edited

Loading

ttoivone Nov 24, 2022 •

edited

Loading

ttoivone Mar 7, 2023 •

edited

Loading

ttoivone Mar 7, 2023 •

edited

Loading

ttoivone Mar 7, 2023 •

edited

Loading

ttoivone Mar 7, 2023 •

edited

Loading

ttoivone Mar 8, 2023 •

edited

Loading

jan-ivar Jan 13, 2023 •

edited

Loading

ttoivone Mar 8, 2023 •

edited

Loading