This is a story about gRPC client and a Let's Encrypt certificate chain that is cross-signed by two root certificates.
Our context
In one of our apps, we have a gRPC client that connects to a gRPC server of a 3rd party service that we use. The app is written in Ruby, thus we use the Ruby flavour of gRPC, which uses the gRPC core C library under the hood.
As many do, said 3rd party service uses SSL connections secured by Let's Encrypt (LE) certs. There is a certificate chain that goes from those per-service LE certs to a LE certificate authority (CA) cert, which again is signed by widely known root certs. In this case it even was cross-signed by two certs: DST Root CA X3 and ISRG Root X1.
The first one had been used by LE longer ago, to "get off the ground", as they write, because it was widely trusted. In recent times LE preferred the second, newer one, but still supported the older one via cross-sign.
On Sep. 30th 2021, that older DST Root CA X3 cert expired (see this Let's Encrypt blogpost explaining the background). While that shouldn't have been a problem, because the ISRG Root X1 was also part of the chain, this caused us troubles.
The problem
Last week β roughly one week after Sep. 30th β we restarted our app for the first time since weeks, through a deployment by CD pipeline. After restart, the gRPC client could not establish SSL connections to its gRPC server anymore. We got errors saying the server cert was invalid:
E1006 13:48:18.326843613 93 ssl_transport_security.cc:1446] Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED.
GRPC::Unavailable: 14:failed to connect to all addresses.
After some digging, we found out there is a bug in the gRPC core C library. It didn't get it that there are two root certs in the chain, out of which one was still valid. It stuck with the invalid DST Root CA X3 cert, and thus failed cert verification.
Reportedly, only the core library β and by that all languages building on it, like Ruby or Python β was affected, but not the Java or the .Net version.
The solution
We could not wait for the bugfix. To get our app back running, we needed to persuade the gRPC client that the server cert indeed was valid. Therefore we removed the now-invalid DST Root CA X3 root cert from our Docker image, rebuilt the collection of CA certs, and forced gRPC to use that. That did the trick.
This is what we added to the Dockerfile (on a Debian base image):
# remove the now-invalid root cert
RUN rm /usr/share/ca-certificates/mozilla/DST_Root_CA_X3.crt
# rebuild collection of root certs
RUN update-ca-certificates
# force gRPC to use that collection
ENV GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/etc/ssl/certs/ca-certificates.crt
Meanwhile, the mentioned bug is fixed. But maybe you can't upgrade easily or quickly, so I hope this story helps some of you.
Further information
gRPC isn't the only one affected by this. The older OpenSSL version 1.0.2 also stumbles. This post arrives at the same solution, and also has a nice collection of links for more background information.
Top comments (0)