MediaPipe Iris
Table of contents
Overview
A wide range of real-world applications, including computational photography (glint reflection) and augmented reality effects (virtual avatars) rely on accurately tracking the iris within an eye. This is a challenging task to solve on mobile devices, due to the limited computing resources, variable light conditions and the presence of occlusions, such as hair or people squinting. Iris tracking can also be utilized to determine the metric distance of the camera to the user. This can improve a variety of use cases, ranging from virtual try-on of properly sized glasses and hats to accessibility features that adopt the font size depending on the viewer’s distance. Often, sophisticated specialized hardware is employed to compute the metric distance, limiting the range of devices on which the solution could be applied.
MediaPipe Iris is a ML solution for accurate iris estimation, able to track landmarks involving the iris, pupil and the eye contours using a single RGB camera, in real-time, without the need for specialized hardware. Through use of iris landmarks, the solution is also able to determine the metric distance between the subject and the camera with relative error less than 10%. Note that iris tracking does not infer the location at which people are looking, nor does it provide any form of identity recognition. With the cross-platfrom capability of the MediaPipe framework, MediaPipe Iris can run on most modern mobile phones, desktops/laptops and even on the web.
Fig 1. Example of MediaPipe Iris: eyelid (red) and iris (blue) contours. |
ML Pipeline
The first step in the pipeline leverages MediaPipe Face Mesh, which generates a mesh of the approximate face geometry. From this mesh, we isolate the eye region in the original image for use in the subsequent iris tracking step.
The pipeline is implemented as a MediaPipe graph that uses a face landmark subgraph from the face landmark module, an iris landmark subgraph from the iris landmark module, and renders using a dedicated iris-and-depth renderer subgraph. The face landmark subgraph internally uses a face detection subgraph from the face detection module.
Note: To visualize a graph, copy the graph and paste it into MediaPipe Visualizer. For more information on how to visualize its associated subgraphs, please see visualizer documentation.
The output of the pipeline is a set of 478 3D landmarks, including 468 face landmarks from MediaPipe Face Mesh, with those around the eyes further refined (see Fig 2), and 10 additional iris landmarks appended at the end (5 for each eye, and see Fig 2 also).
Models
Face Detection Model
The face detector is the same BlazeFace model used in MediaPipe Face Detection.
Face Landmark Model
The face landmark model is the same as in MediaPipe Face Mesh. You can also find more details in this paper.
Iris Landmark Model
The iris model takes an image patch of the eye region and estimates both the eye landmarks (along the eyelid) and iris landmarks (along ths iris contour). You can find more details in this paper.
Fig 2. Eye landmarks (red) and iris landmarks (green). |
Depth-from-Iris
MediaPipe Iris is able to determine the metric distance of a subject to the camera with less than 10% error, without requiring any specialized hardware. This is done by relying on the fact that the horizontal iris diameter of the human eye remains roughly constant at 11.7±0.5 mm across a wide population, along with some simple geometric arguments. For more details please refer to our Google AI Blog post.
Fig 3. (Left) MediaPipe Iris predicting metric distance in cm on a Pixel 2 from iris tracking without use of a depth sensor. (Right) Ground-truth depth. |
Example Apps
Please first see general instructions for Android, iOS and desktop on how to build MediaPipe examples.
Note: To visualize a graph, copy the graph and paste it into MediaPipe Visualizer. For more information on how to visualize its associated subgraphs, please see visualizer documentation.
Mobile
- Graph:
mediapipe/graphs/iris_tracking/iris_tracking_gpu.pbtxt
- Android target: (or download prebuilt ARM64 APK)
mediapipe/examples/android/src/java/com/google/mediapipe/apps/iristrackinggpu:iristrackinggpu
- iOS target:
mediapipe/examples/ios/iristrackinggpu:IrisTrackingGpuApp
Desktop
Live Camera Input
Please first see general instructions for desktop on how to build MediaPipe examples.
- Running on CPU
- Running on GPU
Video File Input
-
To build the application, run:
bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/iris_tracking:iris_tracking_cpu_video_input
-
To run the application, replace
<input video path>
and<output video path>
in the command below with your own paths:bazel-bin/mediapipe/examples/desktop/iris_tracking/iris_tracking_cpu_video_input \ --calculator_graph_config_file=mediapipe/graphs/iris_tracking/iris_tracking_cpu_video_input.pbtxt \ --input_side_packets=input_video_path=<input video path>,output_video_path=<output video path>
Single-image Depth Estimation
-
To build the application, run:
bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/iris_tracking:iris_depth_from_image_desktop
-
To run the application, replace
<input image path>
and<output image path>
in the command below with your own paths:GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/iris_tracking/iris_depth_from_image_desktop \ --input_image_path=<input image path> --output_image_path=<output image path>
Web
You can use the following links to load a demo in the MediaPipe Visualizer, and over there click the “Runner” icon in the top bar like shown below. The demos use your webcam video as input, which is processed all locally in real-time and never leaves your device. Please see MediaPipe on the Web in Google Developers Blog for details.
Resources
- Google AI Blog: MediaPipe Iris: Real-time Eye Tracking and Depth Estimation
- Paper: Real-time Pupil Tracking from Monocular Video for Digital Puppetry (presentation)
- Models and model cards