Dynamically load protobuf definitions using python modules

by Gregor Uhlenheuer on October 19, 2017

Recently I wanted to write a quick script in python that should read some data from a Cassandra database table and decode and dump the stored protobuf payloads. Since the payloads are serialized with different types of protobuf messages the lookup has to be dynamic based on a “string manifest”.

Let me quickly describe what I came up with that ended up being a pretty flexible attempt to be included in an automated build.

protobuf

At first we want to get all protobuf definitions that are probably placed somewhere in the project repository.

$ mkdir -p proto
$ cp $(find /some/repository/path -path '*src/protobuf') proto

After that we compile the protobuf message definitions into actual python files:

$ protoc --python_out=proto $(find proto -name '*.proto')

Now we could end up with have a directory structure like this (including sub directories!):

$ tree proto
proto
├── details
│   ├── details_pb2.py
│   └── details.proto
├── messages_pb2.py
└── messages.proto

python

Now let’s have a look in the main script file protoload.py. I’ll skip the Cassandra database part altogether as that’s not very interesting anyways.

The most interesting part in here is basically the line which imports the proto module. In order for this to work properly without having to explicitly importing every message definition module by hand we have to write some glue logic in the __init__.py of the proto module:

Using this approach you only have to recompile the protobuf definitions into new python modules inside the target proto folder to be automatically picked up by the script.

This post is tagged with linux, programming, protobuf, google and python