by August 23, 2016on
My first attempt was to use python because that is available on most of our machines at work anyways. After I finished the task pretty quickly I discovered the python version was ridiculously slow on protobuf parsing. As the tool was actually supposed to process millions of records waiting for hour(s) wasn’t an option. Obviously there are ways to improve the python protobuf performance by using custom-compiled protobuf libraries but that wasn’t too easy to accomplish and no process to put on everybody looking to use that tool.
My next thought was:
Come on! If C++ is supposed to be that much faster how difficult can that be!
Actually is wasn’t too difficult indeed - integrating the Cassandra C++ driver and using the protobuf C++ libraries went pretty smoothly. Soon I had a small console application running that was much faster than the python version I built at first.
Now I was happy, right? Well, almost…
At that point my current
Makefile looked somewhat like the following:
SRCS := $(wildcard src/*.cc) $(wildcard src/*/*.cc) OBJECTS := $(patsubst %.cc,%.o, $(SRCS)) CPPFLAGS=-std=c++11 -O2 -g -Wall .PHONY: all clean all: event-reader $(OBJECTS): %.o : %.cc g++ $(CPPFLAGS) -c -Ilibs/cpp-driver/include -o $@ $< event-reader: $(OBJECTS) g++ $(OBJECTS) -o event-reader -Llibs/cpp-driver/build -lcassandra -lprotobuf -lz
Looking at the above you will probably notice the problem there: the resulting binary I build is dynamically linked to the Cassandra C++ driver (
-lcassandra), the protobuf library (
-lprotobuf), zlib (
-lz) and the dependencies of the mentioned ones. That doesn’t fit my goals of having a more-or-less portable executable that can be easily used by anyone.
There is of course a solution at hand: a statically linked executable. After reading
man gcc and asking google it is supposed to be pretty easy but I could remember having some problems with that some years ago…
But hey, I was probably pretty stupid at that time - how difficult can that be?
In theory it shouldn’t be much more than adding
-static to the
gcc invokation of linking the executable. I read some things of problems with statically linking the C++ standard library - so there are some more flags to toggle of course:
After fiddling around with numerous
gcc switches for quite some time I finally succeeded without
-static-libstdc++ but instead with
-lstdc++. To be honest I have no explanation why this one works while the others don’t but this is what finally got me going:
event-reader-static: $(OBJECTS) g++ -s -static $(OBJECTS) -o event-reader-static -Llibs/cpp-driver/build -lcassandra_static -luv -lpthread -lprotobuf -lz -lstdc++
There are a few things to note here:
-luvfor the cassandra driver)
-sto strip the resulting executable to reduce its final file size (optional)
On my road to the wisdom of statically linked executables I stumbled upon the following tips to check if the static linking actually worked properly:
ldd <executable>should report:
not a dynamic executable
nm <executable> | grep ' U 'should be empty (listing unresolved symbols)
Finally we have an executable that runs - on this machine at least…
In the unlikely case you followed along with a similar executable you have noticed a warning by
gcc that we did actually include some dynamic dependencies in our executable:
nss to be specific. Why is that?
My explanation is probably not too accurate but by default
glibc dynamically links with
libnss. That could result in a failure on the target machine your executable is running in case the versions differ. There are some explanations in the glibc wiki on this topic.
As the above glibc wiki entry explains there are possibilities to force your glibc to statically include
nss as well. The explanations weren’t too helpful for me and I didn’t want to mess with my local glibc install so my approach was to use a glibc alternative instead: musl libc
musl is an alternative to glibc and describes itself as:
a new standard library to power a new generation of Linux-based devices. musl is lightweight, fast, simple, free, and strives to be correct in the sense of standards-conformance and safety.
As I mentioned I didn’t intend to mess with my local glibc install too much, that’s why I opted for using docker for that matter. There is a linux distribution called Alpine linux that uses
musl by default. So we are going to build our executable in a tiny docker container instead.
This is the blueprint for the
Dockerfile to use for building:
FROM alpine:3.4 RUN apk add --no-cache gcc g++ make cmake openssl-dev libuv-dev protobuf-dev ENTRYPOINT ["/bin/sh"]
Now we can build the docker image and run it in the source directory:
$ docker build -t uhlenheuer/musl-builder . $ docker run --rm -it -v $(pwd):/tmp/build uhlenheuer/musl-builder # inside the docker container $$ cd /tmp/build $$ make clean event-reader-static $$ exit
That’s it! We finally have a statically linked executable that is as more-or-less portable in a sense that it runs on any machine with the same architecture at least.
A final note to my fellow gentoo users, in case you don’t follow the docker approach you have to build your dependencies with the
static-libs USE flag of course. Moreover watch out for the
CFLAGS you are building your static libraries with because I had some trouble on other machines with
invalid/unknown instruction errors due to some CPU specific compiler flags.