Dynamically load protobuf definitions using python modules

by Gregor Uhlenheuer on October 19, 2017

Recently I wanted to write a quick script in python that should read some data from a Cassandra database table and decode and dump the stored protobuf payloads. Since the payloads are serialized with different types of protobuf messages the lookup has to be dynamic based on a “string manifest”.

Let me quickly describe what I came up with that ended up being a pretty flexible attempt to be included in an automated build.

protobuf

At first we want to get all protobuf definitions that are probably placed somewhere in the project repository.

$ mkdir -p proto
$ cp $(find /some/repository/path -path '*src/protobuf') proto

After that we compile the protobuf message definitions into actual python files:

$ protoc --python_out=proto $(find proto -name '*.proto')

Now we could end up with have a directory structure like this (including sub directories!):

$ tree proto
proto
├── details
│   ├── details_pb2.py
│   └── details.proto
├── messages_pb2.py
└── messages.proto

python

Now let’s have a look in the main script file protoload.py. I’ll skip the Cassandra database part altogether as that’s not very interesting anyways.

#!/usr/bin/env python

from __future__ import print_function
import sys

from google.protobuf import symbol_database as sdb

# this will import all protobuf definitions under 'proto'
import proto

# all loaded message descriptors and symbols will be
# registered in this symbol database instance
__db = sdb.Default()


def _get_records():
    # XXX: fetch data from the Cassandra in here
    pass


def __find_message(manifest):
    try:
        symb = __db.GetSymbol(manifest)
        return symb()
    except KeyError:
        print('unknown record manifest "%s"' % manifest, file=sys.stderr)
        return None


def _extract_record(record):
    # XXX: this is just a proof-of-concept
    # try to find matching message description based on 'manifest'
    # and parse the payload using the retrieved protobuf definition
    msg = __find_message(record.manifest)
    if msg:
        payload = record.payload
        msg.ParseFromString(payload)
        return msg
    return None


def _main():
    for record in _get_records():
        extracted = _extract_record(record)
        if extracted:
            print(extracted)


if __name__ == '__main__':
    _main()

The most interesting part in here is basically the line which imports the proto module. In order for this to work properly without having to explicitly importing every message definition module by hand we have to write some glue logic in the __init__.py of the proto module:

import importlib
import os


def __import_modules(dirname, paths):
    # iterate through dir contents
    for mod in os.listdir(dirname):
        full = os.path.join(dirname, mod)

        # recurse into sub directories
        if os.path.isdir(full):
            __import_modules(full, paths + [mod])
        # import all .py files other than __init__.py
        elif os.path.isfile(full) and mod != '__init__.py' and mod[-3:] == '.py':
            base = '.'.join(paths)
            module = mod[:-3]
            importlib.import_module('%s.%s' % (base, module))

# start in the current directory
__import_modules(os.path.dirname(__file__), ['proto'])

Using this approach you only have to recompile the protobuf definitions into new python modules inside the target proto folder to be automatically picked up by the script.

This post is tagged with linux, programming, protobuf, google and python