by October 19, 2017
onRecently I wanted to write a quick script in python that should read some data from a Cassandra database table and decode and dump the stored protobuf payloads. Since the payloads are serialized with different types of protobuf messages the lookup has to be dynamic based on a “string manifest”.
Let me quickly describe what I came up with that ended up being a pretty flexible attempt to be included in an automated build.
At first we want to get all protobuf definitions that are probably placed somewhere in the project repository.
$ mkdir -p proto
$ cp $(find /some/repository/path -path '*src/protobuf') proto
After that we compile the protobuf message definitions into actual python files:
$ protoc --python_out=proto $(find proto -name '*.proto')
Now we could end up with have a directory structure like this (including sub directories!):
$ tree proto
proto
├── details
│ ├── details_pb2.py
│ └── details.proto
├── messages_pb2.py
└── messages.proto
Now let’s have a look in the main script file protoload.py
. I’ll skip the Cassandra database part altogether as that’s not very interesting anyways.
#!/usr/bin/env python
from __future__ import print_function
import sys
from google.protobuf import symbol_database as sdb
# this will import all protobuf definitions under 'proto'
import proto
# all loaded message descriptors and symbols will be
# registered in this symbol database instance
__db = sdb.Default()
def _get_records():
# XXX: fetch data from the Cassandra in here
pass
def __find_message(manifest):
try:
symb = __db.GetSymbol(manifest)
return symb()
except KeyError:
print('unknown record manifest "%s"' % manifest, file=sys.stderr)
return None
def _extract_record(record):
# XXX: this is just a proof-of-concept
# try to find matching message description based on 'manifest'
# and parse the payload using the retrieved protobuf definition
msg = __find_message(record.manifest)
if msg:
payload = record.payload
msg.ParseFromString(payload)
return msg
return None
def _main():
for record in _get_records():
extracted = _extract_record(record)
if extracted:
print(extracted)
if __name__ == '__main__':
_main()
The most interesting part in here is basically the line which imports the proto
module. In order for this to work properly without having to explicitly importing every message definition module by hand we have to write some glue logic in the __init__.py
of the proto
module:
import importlib
import os
def __import_modules(dirname, paths):
# iterate through dir contents
for mod in os.listdir(dirname):
full = os.path.join(dirname, mod)
# recurse into sub directories
if os.path.isdir(full):
__import_modules(full, paths + [mod])
# import all .py files other than __init__.py
elif os.path.isfile(full) and mod != '__init__.py' and mod[-3:] == '.py':
base = '.'.join(paths)
module = mod[:-3]
importlib.import_module('%s.%s' % (base, module))
# start in the current directory
__import_modules(os.path.dirname(__file__), ['proto'])
Using this approach you only have to recompile the protobuf definitions into new python modules inside the target proto
folder to be automatically picked up by the script.