by August 23, 2016
onA few days ago I got back to coding some C++ again! I wanted to write a small console application that should access a Cassandra cluster to read and analyze some protobuf encoded entities.
My first attempt was to use python because that is available on most of our machines at work anyways. After I finished the task pretty quickly I discovered the python version was ridiculously slow on protobuf parsing. As the tool was actually supposed to process millions of records waiting for hour(s) wasn’t an option. Obviously there are ways to improve the python protobuf performance by using custom-compiled protobuf libraries but that wasn’t too easy to accomplish and no process to put on everybody looking to use that tool.
My next thought was:
Come on! If C++ is supposed to be that much faster how difficult can that be!
Actually is wasn’t too difficult indeed - integrating the Cassandra C++ driver and using the protobuf C++ libraries went pretty smoothly. Soon I had a small console application running that was much faster than the python version I built at first.
Now I was happy, right? Well, almost…
At that point my current Makefile
looked somewhat like the following:
SRCS := $(wildcard src/*.cc) $(wildcard src/*/*.cc)
OBJECTS := $(patsubst %.cc,%.o, $(SRCS))
CPPFLAGS=-std=c++11 -O2 -g -Wall
.PHONY: all clean
all: event-reader
$(OBJECTS): %.o : %.cc
g++ $(CPPFLAGS) -c -Ilibs/cpp-driver/include -o $@ $<
event-reader: $(OBJECTS)
g++ $(OBJECTS) -o event-reader -Llibs/cpp-driver/build -lcassandra -lprotobuf -lz
Looking at the above you will probably notice the problem there: the resulting binary I build is dynamically linked to the Cassandra C++ driver (-lcassandra
), the protobuf library (-lprotobuf
), zlib (-lz
) and the dependencies of the mentioned ones. That doesn’t fit my goals of having a more-or-less portable executable that can be easily used by anyone.
There is of course a solution at hand: a statically linked executable. After reading man gcc
and asking google it is supposed to be pretty easy but I could remember having some problems with that some years ago…
But hey, I was probably pretty stupid at that time - how difficult can that be?
In theory it shouldn’t be much more than adding -static
to the gcc
invokation of linking the executable. I read some things of problems with statically linking the C++ standard library - so there are some more flags to toggle of course: -static-libgcc
and -static-libstdc++
.
After fiddling around with numerous gcc
switches for quite some time I finally succeeded without -static-libgcc
and -static-libstdc++
but instead with -lstdc++
. To be honest I have no explanation why this one works while the others don’t but this is what finally got me going:
event-reader-static: $(OBJECTS)
g++ -s -static $(OBJECTS) -o event-reader-static -Llibs/cpp-driver/build -lcassandra_static -luv -lpthread -lprotobuf -lz -lstdc++
There are a few things to note here:
libcassandra_static
-lpthread
and -luv
for the cassandra driver)-s
to strip the resulting executable to reduce its final file size (optional)On my road to the wisdom of statically linked executables I stumbled upon the following tips to check if the static linking actually worked properly:
ldd <executable>
should report: not a dynamic executable
nm <executable> | grep ' U '
should be empty (listing unresolved symbols)Finally we have an executable that runs - on this machine at least…
In the unlikely case you followed along with a similar executable you have noticed a warning by gcc
that we did actually include some dynamic dependencies in our executable: nss
to be specific. Why is that?
My explanation is probably not too accurate but by default glibc
dynamically links with libnss
. That could result in a failure on the target machine your executable is running in case the versions differ. There are some explanations in the glibc wiki on this topic.
As the above glibc wiki entry explains there are possibilities to force your glibc to statically include nss
as well. The explanations weren’t too helpful for me and I didn’t want to mess with my local glibc install so my approach was to use a glibc alternative instead: musl libc
musl is an alternative to glibc and describes itself as:
a new standard library to power a new generation of Linux-based devices. musl is lightweight, fast, simple, free, and strives to be correct in the sense of standards-conformance and safety.
As I mentioned I didn’t intend to mess with my local glibc install too much, that’s why I opted for using docker for that matter. There is a linux distribution called Alpine linux that uses musl
by default. So we are going to build our executable in a tiny docker container instead.
This is the blueprint for the Dockerfile
to use for building:
FROM alpine:3.4
RUN apk add --no-cache gcc g++ make cmake openssl-dev libuv-dev protobuf-dev
ENTRYPOINT ["/bin/sh"]
Now we can build the docker image and run it in the source directory:
$ docker build -t uhlenheuer/musl-builder .
$ docker run --rm -it -v $(pwd):/tmp/build uhlenheuer/musl-builder
# inside the docker container
$$ cd /tmp/build
$$ make clean event-reader-static
$$ exit
That’s it! We finally have a statically linked executable that is as more-or-less portable in a sense that it runs on any machine with the same architecture at least.
A final note to my fellow gentoo users, in case you don’t follow the docker approach you have to build your dependencies with the static-libs
USE flag of course. Moreover watch out for the CFLAGS
you are building your static libraries with because I had some trouble on other machines with invalid/unknown instruction
errors due to some CPU specific compiler flags.