MediaPipe Python Framework
The MediaPipe Python framework grants direct access to the core components of the MediaPipe C++ framework such as Timestamp, Packet, and CalculatorGraph, whereas the ready-to-use Python solutions hide the technical details of the framework and simply return the readable model inference results back to the callers.
MediaPipe framework sits on top of the pybind11 library. The C++ core framework is exposed in Python via a C++/Python language binding. The content below assumes that the reader already has a basic understanding of the MediaPipe C++ framework. Otherwise, you can find useful information in Framework Concepts.
Packet
The packet is the basic data flow unit in MediaPipe. A packet consists of a numeric timestamp and a shared pointer to an immutable payload. In Python, a MediaPipe packet can be created by calling one of the packet creator methods in the mp.packet_creator
module. Correspondingly, the packet payload can be retrieved by using one of the packet getter methods in the mp.packet_getter
module. Note that the packet payload becomes immutable after packet creation. Thus, the modification of the retrieved packet content doesn’t affect the actual payload in the packet. MediaPipe framework Python API supports the most commonly used data types of MediaPipe (e.g., ImageFrame, Matrix, Protocol Buffers, and the primitive data types) in the core binding. The comprehensive table below shows the type mappings between the Python and the C++ data type along with the packet creator and the content getter method for each data type supported by the MediaPipe Python framework API.
Python Data Type | C++ Data Type | Packet Creator | Content Getter |
---|---|---|---|
bool | bool | create_bool(True) | get_bool(packet) |
int or np.intc | int_t | create_int(1) | get_int(packet) |
int or np.int8 | int8_t | create_int8(2**7-1) | get_int(packet) |
int or np.int16 | int16_t | create_int16(2**15-1) | get_int(packet) |
int or np.int32 | int32_t | create_int32(2**31-1) | get_int(packet) |
int or np.int64 | int64_t | create_int64(2**63-1) | get_int(packet) |
int or np.uint8 | uint8_t | create_uint8(2**8-1) | get_uint(packet) |
int or np.uint16 | uint16_t | create_uint16(2**16-1) | get_uint(packet) |
int or np.uint32 | uint32_t | create_uint32(2**32-1) | get_uint(packet) |
int or np.uint64 | uint64_t | create_uint64(2**64-1) | get_uint(packet) |
float or np.float32 | float | create_float(1.1) | get_float(packet) |
float or np.double | double | create_double(1.1) | get_float(packet) |
str (UTF-8) | std::string | create_string(‘abc’) | get_str(packet) |
bytes | std::string | create_string(b’\xd0\xd0\xd0’) | get_bytes(packet) |
mp.Packet | mp::Packet | create_packet(p) | get_packet(packet) |
List[bool] | std::vector<bool> | create_bool_vector([True, False]) | get_bool_list(packet) |
List[int] or List[np.intc] | int[] | create_int_array([1, 2, 3]) | get_int_list(packet, size=10) |
List[int] or List[np.intc] | std::vector<int> | create_int_vector([1, 2, 3]) | get_int_list(packet) |
List[float] or List[np.float] | float[] | create_float_arrary([0.1, 0.2]) | get_float_list(packet, size=10) |
List[float] or List[np.float] | std::vector<float> | create_float_vector([0.1, 0.2]) | get_float_list(packet, size=10) |
List[str] | std::vector<std::string> | create_string_vector([‘a’]) | get_str_list(packet) |
List[mp.Packet] | std::vector<mp::Packet> | create_packet_vector( [packet1, packet2]) | get_packet_list(p) |
Mapping[str, Packet] | std::map<std::string, Packet> | create_string_to_packet_map( {‘a’: packet1, ‘b’: packet2}) | get_str_to_packet_dict(packet) |
np.ndarray (cv.mat and PIL.Image) | mp::ImageFrame | create_image_frame( format=ImageFormat.SRGB, data=mat) | get_image_frame(packet) |
np.ndarray | mp::Matrix | create_matrix(data) | get_matrix(packet) |
Google Proto Message | Google Proto Message | create_proto(proto) | get_proto(packet) |
List[Proto] | std::vector<Proto> | n/a | get_proto_list(packet) |
It’s not uncommon that users create custom C++ classes and and send those into the graphs and calculators. To allow the custom classes to be used in Python with MediaPipe, you may extend the Packet API for a new data type in the following steps:
-
Write the pybind11 class binding code or a custom type caster for the custom type in a cc file.
#include "path/to/my_type/header/file.h" #include "pybind11/pybind11.h" namespace py = pybind11; PYBIND11_MODULE(my_type_binding, m) { // Write binding code or a custom type caster for MyType. py::class_<MyType>(m, "MyType") .def(py::init<>()) .def(...); }
-
Create a new packet creator and getter method of the custom type in a separate cc file.
#include "path/to/my_type/header/file.h" #include "mediapipe/framework/packet.h" #include "pybind11/pybind11.h" namespace mediapipe { namespace py = pybind11; PYBIND11_MODULE(my_packet_methods, m) { m.def( "create_my_type", [](const MyType& my_type) { return MakePacket<MyType>(my_type); }); m.def( "get_my_type", [](const Packet& packet) { if(!packet.ValidateAsType<MyType>().ok()) { PyErr_SetString(PyExc_ValueError, "Packet data type mismatch."); return py::error_already_set(); } return packet.Get<MyType>(); }); } // namespace mediapipe
-
Add two bazel build rules for the custom type binding and the new packet methods in the BUILD file.
load("@pybind11_bazel//:build_defs.bzl", "pybind_extension") pybind_extension( name = "my_type_binding", srcs = ["my_type_binding.cc"], deps = [":my_type"], ) pybind_extension( name = "my_packet_methods", srcs = ["my_packet_methods.cc"], deps = [ ":my_type", "//mediapipe/framework:packet" ], )
-
Build the pybind extension targets (with the suffix .so) by Bazel and move the generated dynamic libraries into one of the $LD_LIBRARY_PATH dirs.
-
Use the binding modules in Python.
import my_type_binding import my_packet_methods packet = my_packet_methods.create_my_type(my_type_binding.MyType()) my_type = my_packet_methods.get_my_type(packet)
Timestamp
Each packet contains a timestamp that is in units of microseconds. In Python, the Packet API provides a convenience method packet.at()
to define the numeric timestamp of a packet. More generally, packet.timestamp
is the packet class property for accessing the underlying timestamp. To convert an Unix epoch to a MediaPipe timestamp, the Timestamp API offers a method mp.Timestamp.from_seconds()
for this purpose.
ImageFrame
ImageFrame is the container for storing an image or a video frame. Formats supported by ImageFrame are listed in the ImageFormat enum. Pixels are encoded row-major with interleaved color components, and ImageFrame supports uint8, uint16, and float as its data types. MediaPipe provides an ImageFrame Python API to access the ImageFrame C++ class. In Python, the easiest way to retrieve the pixel data is to call image_frame.numpy_view()
to get a numpy ndarray. Note that the returned numpy ndarray, a reference to the internal pixel data, is unwritable. If the callers need to modify the numpy ndarray, it’s required to explicitly call a copy operation to obtain a copy. When MediaPipe takes a numpy ndarray to make an ImageFrame, it assumes that the data is stored contiguously. Correspondingly, the pixel data of an ImageFrame will be realigned to be contiguous when it’s returned to the Python side.
Graph
In MediaPipe, all processing takes places within the context of a CalculatorGraph. The CalculatorGraph Python API is a direct binding to the C++ CalculatorGraph class. The major difference is the CalculatorGraph Python API raises a Python error instead of returning a non-OK Status when an error occurs. Therefore, as a Python user, you can handle the exceptions as you normally do. The life cycle of a CalculatorGraph contains three stages: initialization and setup, graph run, and graph shutdown.
-
Initialize a CalculatorGraph with a CalculatorGraphConfig protobuf or binary protobuf file, and provide callback method(s) to observe the output stream(s).
Option 1. Initialize a CalculatorGraph with a CalculatorGraphConfig protobuf or its text representation, and observe the output stream(s):
import mediapipe as mp config_text = """ input_stream: 'in_stream' output_stream: 'out_stream' node { calculator: 'PassThroughCalculator' input_stream: 'in_stream' output_stream: 'out_stream' } """ graph = mp.CalculatorGraph(graph_config=config_text) output_packets = [] graph.observe_output_stream( 'out_stream', lambda stream_name, packet: output_packets.append(mp.packet_getter.get_str(packet)))
Option 2. Initialize a CalculatorGraph with with a binary protobuf file, and observe the output stream(s).
import mediapipe as mp # resources dependency graph = mp.CalculatorGraph( binary_graph=os.path.join( resources.GetRunfilesDir(), 'path/to/your/graph.binarypb')) graph.observe_output_stream( 'out_stream', lambda stream_name, packet: print(f'Get {packet} from {stream_name}'))
-
Start the graph run and feed packets into the graph.
graph.start_run() graph.add_packet_to_input_stream( 'in_stream', mp.packet_creator.create_str('abc').at(0)) rgb_img = cv2.cvtColor(cv2.imread('/path/to/your/image.png'), cv2.COLOR_BGR2RGB) graph.add_packet_to_input_stream( 'in_stream', mp.packet_creator.create_image_frame(format=mp.ImageFormat.SRGB, data=rgb_img).at(1))
-
Close the graph after finish. You may restart the graph for another graph run after the call to
close()
.graph.close()
The Python script can be run by your local Python runtime.