Technology Solutions for Everyday Folks

Breaking the Chain: An Edge Case of Let's Encrypt Root Certificate Expiration

It's been written about and announced for some time—the forthcoming expiration of the DST Root CA X3 certificate. The good news for most folks is that it's not a big deal. And that, I thought, also included me. For the most part, this has panned out to be true.

However, late last week I discovered an intermittent problem with a 'bot' service that I've got running and working against a server host with an active Let's Encrypt certificate. I started to get periodic (but not consistent, because of course not) SSL CA errors, which caused the bot process to not update. The first time I noticed the problem I assumed it was a bit of a fluke. I knew the expiration was coming, so presumably the "client" side (over which I have no control) was doing what it needed to internally handle the CA certificate expiration. When the problem resolved itself within a half hour...things continued to behave. Definitely a fluke, right?

Then it happened a second time on a different day. Same symptoms, and eventually resolved itself. This is when I started paying closer attention, not sure if it was something I could control or not. Given it was a client-side error, my gut reaction was 'no,' but the intermittent behavior made troubleshooting more tricky. By the time I got to looking, the problem had again resolved itself.

The third day, however, things were different. Still same symptoms, but I had the freedom of a lazy Saturday at my disposal. And so I dug in...

Is It Me?

This bot process hits one of my servers, so I did some investigation into the Let's Encrypt cert for the host. All looks well, and things are fine there. It appears to be a client issue, and after much Googling and digging, specifically the OpenSSL <= 1.0.2 problem with the lack of "trusted first" configuration options.

I happened to have a non-production environment at my disposal where OpenSSL < 1.0.2 was installed so I could mimic the behavior of an old client. Many thanks to the helpful information from Mister PKI, I could use the s_client command to examine the cert in detail:

openssl s_client -connect bot.host.com:443 -servername bot.host.com

Running this command gave me (among other output) the following:

CONNECTED(00000003)
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = bot.host.com
verify return:1
---
Certificate chain
 0 s:/CN=bot.host.com
   i:/C=US/O=Let's Encrypt/CN=R3
 1 s:/C=US/O=Let's Encrypt/CN=R3
   i:/O=Digital Signature Trust Co./CN=DST Root CA X3

It is in that last line (certificate chain 1) we see the offender: CN=DST Root CA X3.

Now What?

I don't have any control or influence over the client side on this one, and given the intermittent nature of the problem right now I suspect that one (or few) of many clients (in a highly distributed environment) aren't all up to speed. Regardless, I need to fix this as best I can with the things over which I control.

Fortunately, the folks at OpenSSL published an article about this very situation, and in my case I have to use Workaround 3: modify the certificate chain to directly use the alternative certificate chain.

The Implementation

As I've written about before, installing Let's Encrypt certificates on some of my hosts requires use of cPanel for the "last mile" where I have to copy/paste the contents of the various .pem (CRT and KEY) files into the interface to properly install. This bot service/host is one such host.

Instead of using the "Autofill by Certificate" option as I do for most such hosts/certs, this time I actually do need to manually copy/paste in the chain.pem contents to the CABUNDLE box (see reused screenshot from the original article):

Screenshot of cPanel certificate installation page

Install the certificate (in this case I actually reused the existing unexpired host certificate but specified the chain.pem details).

Voila!

Verification

After waiting a few minutes for the dust to settle, I did a little digging. First off, started with examining the certs via the browser. The "new" one and its chain "looks identical" to the "broken" one, which isn't surprising since browsers know how to deal with this problem. So I jump back over to my non-production environment with old OpenSSL and run the same s_client command as before to get the following:

CONNECTED(00000003)
depth=3 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = bot.host.com
verify return:1
---
Certificate chain
 0 s:/CN=bot.host.com
   i:/C=US/O=Let's Encrypt/CN=R3
 1 s:/C=US/O=Let's Encrypt/CN=R3
   i:/C=US/O=Internet Security Research Group/CN=ISRG Root X1
 2 s:/C=US/O=Internet Security Research Group/CN=ISRG Root X1
   i:/O=Digital Signature Trust Co./CN=DST Root CA X3

Now we see the good and proper replacement in the chain: the CN=ISRG Root X1 (in addition to the deprecated: CN=DST Root CA X3, which is no longer directly relevant). Woohoo!

As another verification, I hop over to the super useful SSL Server Test service and drop in the hostname. Among other useful (but irrelevant info for this problem), I see exactly what I want of the "Additional Certificates" section:

Screenshot of "Additional Certificates" report section

As a sanity check of sorts, I run the same check on a similarly-"autofilled" certificate to verify the difference:

Screenshot of "Additional Certificates" report section

Since this is exactly what I am expecting to see (an explicit include/path for the ISRG Root X1 certificate), I have high confidence in the solution. As a "final" verification, I work with the bot to invoke what has been a fatal error...and it too is now behaving as expected again.

Follow-Up Steps

The "autofilled" snippet directly above still indicates the R3 certificate is set to expire in a few days. Based on what I understand (admittedly: not an expert here), from the existing controls and mitigations set in place by Let's Encrypt this should not impact anything as it reaches its proper expiration since most clients already know how to (and do) handle this situation appropriately. I don't know that this will behave exactly as I think, though. That said, the comparison "autofill" here is set to renew soon, so if for some reason the old root cert doesn't roll over/behave as I suspect, I'll just do an earlier renew for the cert than normal which itself will fix the problem. Comparing a more recent cert renewal, the R3 cert expiration shows the 2025 date, so one way or another I'll get this comparison one fixed...either automatically or manually.

Always something to experiment with!

I will keep a closer eye on these hosts over the next couple of days, but I have high confidence that this "fix" has actually done what it should accomplish. And I learned more stuff along the way!