MediaPipe Python Framework

Packet
Timestamp
ImageFrame
Graph

The MediaPipe Python framework grants direct access to the core components of the MediaPipe C++ framework such as Timestamp, Packet, and CalculatorGraph, whereas the ready-to-use Python solutions hide the technical details of the framework and simply return the readable model inference results back to the callers.

MediaPipe framework sits on top of the pybind11 library. The C++ core framework is exposed in Python via a C++/Python language binding. The content below assumes that the reader already has a basic understanding of the MediaPipe C++ framework. Otherwise, you can find useful information in Framework Concepts.

Packet

The packet is the basic data flow unit in MediaPipe. A packet consists of a numeric timestamp and a shared pointer to an immutable payload. In Python, a MediaPipe packet can be created by calling one of the packet creator methods in the mp.packet_creator module. Correspondingly, the packet payload can be retrieved by using one of the packet getter methods in the mp.packet_getter module. Note that the packet payload becomes immutable after packet creation. Thus, the modification of the retrieved packet content doesn’t affect the actual payload in the packet. MediaPipe framework Python API supports the most commonly used data types of MediaPipe (e.g., ImageFrame, Matrix, Protocol Buffers, and the primitive data types) in the core binding. The comprehensive table below shows the type mappings between the Python and the C++ data type along with the packet creator and the content getter method for each data type supported by the MediaPipe Python framework API.

Python Data Type	C++ Data Type	Packet Creator	Content Getter
bool	bool	create_bool(True)	get_bool(packet)
int or np.intc	int_t	create_int(1)	get_int(packet)
int or np.int8	int8_t	create_int8(2**7-1)	get_int(packet)
int or np.int16	int16_t	create_int16(2**15-1)	get_int(packet)
int or np.int32	int32_t	create_int32(2**31-1)	get_int(packet)
int or np.int64	int64_t	create_int64(2**63-1)	get_int(packet)
int or np.uint8	uint8_t	create_uint8(2**8-1)	get_uint(packet)
int or np.uint16	uint16_t	create_uint16(2**16-1)	get_uint(packet)
int or np.uint32	uint32_t	create_uint32(2**32-1)	get_uint(packet)
int or np.uint64	uint64_t	create_uint64(2**64-1)	get_uint(packet)
float or np.float32	float	create_float(1.1)	get_float(packet)
float or np.double	double	create_double(1.1)	get_float(packet)
str (UTF-8)	std::string	create_string(‘abc’)	get_str(packet)
bytes	std::string	create_string(b’\xd0\xd0\xd0’)	get_bytes(packet)
mp.Packet	mp::Packet	create_packet(p)	get_packet(packet)
List[bool]	std::vector<bool>	create_bool_vector([True, False])	get_bool_list(packet)
List[int] or List[np.intc]	int[]	create_int_array([1, 2, 3])	get_int_list(packet, size=10)
List[int] or List[np.intc]	std::vector<int>	create_int_vector([1, 2, 3])	get_int_list(packet)
List[float] or List[np.float]	float[]	create_float_arrary([0.1, 0.2])	get_float_list(packet, size=10)
List[float] or List[np.float]	std::vector<float>	create_float_vector([0.1, 0.2])	get_float_list(packet, size=10)
List[str]	std::vector<std::string>	create_string_vector([‘a’])	get_str_list(packet)
List[mp.Packet]	std::vector<mp::Packet>	create_packet_vector( [packet1, packet2])	get_packet_list(p)
Mapping[str, Packet]	std::map<std::string, Packet>	create_string_to_packet_map( {‘a’: packet1, ‘b’: packet2})	get_str_to_packet_dict(packet)
np.ndarray (cv.mat and PIL.Image)	mp::ImageFrame	create_image_frame( format=ImageFormat.SRGB, data=mat)	get_image_frame(packet)
np.ndarray	mp::Matrix	create_matrix(data)	get_matrix(packet)
Google Proto Message	Google Proto Message	create_proto(proto)	get_proto(packet)
List[Proto]	std::vector<Proto>	n/a	get_proto_list(packet)

It’s not uncommon that users create custom C++ classes and and send those into the graphs and calculators. To allow the custom classes to be used in Python with MediaPipe, you may extend the Packet API for a new data type in the following steps:

Write the pybind11 class binding code or a custom type caster for the custom type in a cc file.

#include "path/to/my_type/header/file.h"
#include "pybind11/pybind11.h"

namespace py = pybind11;

PYBIND11_MODULE(my_type_binding, m) {
  // Write binding code or a custom type caster for MyType.
  py::class_<MyType>(m, "MyType")
      .def(py::init<>())
      .def(...);
}

Create a new packet creator and getter method of the custom type in a separate cc file.

#include "path/to/my_type/header/file.h"
#include "mediapipe/framework/packet.h"
#include "pybind11/pybind11.h"

namespace mediapipe {
namespace py = pybind11;

PYBIND11_MODULE(my_packet_methods, m) {
  m.def(
      "create_my_type",
      [](const MyType& my_type) { return MakePacket<MyType>(my_type); });

  m.def(
      "get_my_type",
      [](const Packet& packet) {
        if(!packet.ValidateAsType<MyType>().ok()) {
          PyErr_SetString(PyExc_ValueError, "Packet data type mismatch.");
          return py::error_already_set();
        }
        return packet.Get<MyType>();
      });
}  // namespace mediapipe

Add two bazel build rules for the custom type binding and the new packet methods in the BUILD file.

load("@pybind11_bazel//:build_defs.bzl", "pybind_extension")

pybind_extension(
    name = "my_type_binding",
    srcs = ["my_type_binding.cc"],
    deps = [":my_type"],
)

pybind_extension(
    name = "my_packet_methods",
    srcs = ["my_packet_methods.cc"],
    deps = [
        ":my_type",
        "//mediapipe/framework:packet"
    ],
)

Build the pybind extension targets (with the suffix .so) by Bazel and move the generated dynamic libraries into one of the $LD_LIBRARY_PATH dirs.

Use the binding modules in Python.

import my_type_binding
import my_packet_methods

packet = my_packet_methods.create_my_type(my_type_binding.MyType())
my_type = my_packet_methods.get_my_type(packet)

Timestamp

Each packet contains a timestamp that is in units of microseconds. In Python, the Packet API provides a convenience method packet.at() to define the numeric timestamp of a packet. More generally, packet.timestamp is the packet class property for accessing the underlying timestamp. To convert an Unix epoch to a MediaPipe timestamp, the Timestamp API offers a method mp.Timestamp.from_seconds() for this purpose.

ImageFrame

ImageFrame is the container for storing an image or a video frame. Formats supported by ImageFrame are listed in the ImageFormat enum. Pixels are encoded row-major with interleaved color components, and ImageFrame supports uint8, uint16, and float as its data types. MediaPipe provides an ImageFrame Python API to access the ImageFrame C++ class. In Python, the easiest way to retrieve the pixel data is to call image_frame.numpy_view() to get a numpy ndarray. Note that the returned numpy ndarray, a reference to the internal pixel data, is unwritable. If the callers need to modify the numpy ndarray, it’s required to explicitly call a copy operation to obtain a copy. When MediaPipe takes a numpy ndarray to make an ImageFrame, it assumes that the data is stored contiguously. Correspondingly, the pixel data of an ImageFrame will be realigned to be contiguous when it’s returned to the Python side.

Graph

In MediaPipe, all processing takes places within the context of a CalculatorGraph. The CalculatorGraph Python API is a direct binding to the C++ CalculatorGraph class. The major difference is the CalculatorGraph Python API raises a Python error instead of returning a non-OK Status when an error occurs. Therefore, as a Python user, you can handle the exceptions as you normally do. The life cycle of a CalculatorGraph contains three stages: initialization and setup, graph run, and graph shutdown.

Initialize a CalculatorGraph with a CalculatorGraphConfig protobuf or binary protobuf file, and provide callback method(s) to observe the output stream(s).

Option 1. Initialize a CalculatorGraph with a CalculatorGraphConfig protobuf or its text representation, and observe the output stream(s):

import mediapipe as mp

config_text = """
  input_stream: 'in_stream'
  output_stream: 'out_stream'
  node {
    calculator: 'PassThroughCalculator'
    input_stream: 'in_stream'
    output_stream: 'out_stream'
  }
"""
graph = mp.CalculatorGraph(graph_config=config_text)
output_packets = []
graph.observe_output_stream(
    'out_stream',
    lambda stream_name, packet:
        output_packets.append(mp.packet_getter.get_str(packet)))

Option 2. Initialize a CalculatorGraph with with a binary protobuf file, and observe the output stream(s).

import mediapipe as mp
# resources dependency

graph = mp.CalculatorGraph(
    binary_graph=os.path.join(
        resources.GetRunfilesDir(), 'path/to/your/graph.binarypb'))
graph.observe_output_stream(
    'out_stream',
    lambda stream_name, packet: print(f'Get {packet} from {stream_name}'))

Start the graph run and feed packets into the graph.

graph.start_run()

graph.add_packet_to_input_stream(
    'in_stream', mp.packet_creator.create_str('abc').at(0))

rgb_img = cv2.cvtColor(cv2.imread('/path/to/your/image.png'), cv2.COLOR_BGR2RGB)
graph.add_packet_to_input_stream(
    'in_stream',
    mp.packet_creator.create_image_frame(format=mp.ImageFormat.SRGB,
                                         data=rgb_img).at(1))

Close the graph after finish. You may restart the graph for another graph run after the call to close().
```
graph.close()
```

The Python script can be run by your local Python runtime.