Fixing DNSSEC Errors - What Can You Do When DNSSEC Goes Bad?
Despite the fact that the Internet is an increasingly critical component of the world’s communications infrastructure, its “phone book” -- the Domain Name System (DNS) -- is often considered fundamentally insecure.
While opinions sometimes vary, the majority of the Internet’s technical participants believe that a critical part of shoring up DNS security is the deployment of Domain Name System Security Extensions (DNSSEC). DNSSEC is an enhancement to the DNS protocol that enables domain name owners to give themselves and their users a more secure and trustworthy experience by using cryptographic signatures.
Simply put, the protocol creates a “chain of trust” that offers users the confidence of knowing that, when they click on a website, their browser will know exactly where to find that website and avoid being hijacked. As the IT infrastructure continues its rapid migration into the cloud, the ability to offer users a sense of security while they’re on the Internet is more of a competitive differentiator than ever before.
While DNSSEC is a huge step towards providing a secure experience, like all innovations, it has its challenges. It is highly likely that in your future, one of the DNSSEC deployments you are working on will not go according to plan. At that point you’ll need to be prepared to recover from DNSSEC errors.
When DNSSEC Goes Bad
The scenario often looks like this: You have just published your DNSSEC signed zone and now notice some breakage, indicated by symptoms like a truncated zone or records that do not validate. Yes, you had quality control, but these things happen. Your mission now is to take corrective action and publish a healthy zone as quickly as possible, and also investigate the cause of break.
If you repeat the zone generation, signing and publishing processes, you may simply recreate the problem that caused the broken zone. Or you can go back to a previous, healthier state by returning to an earlier copy of the zone, which is commonly known as a “rollback.” But a rollback still requires you to update the serial number on the zone, which then breaks the signature on the SOA record. And that takes you back to square one.
Re-Signing Your Zone – Resigned to Delays?
Signing a complete large zone can result in an uncomfortably long delay in a recovery. So, to avoid having to re-sign an entire zone, one option is to always prepare at least two versions of each zone file in parallel. The first version is with the current serial number and prepared in the usual way. The second (and any additional versions) are created with an incremented serial number using a separate signing system and stored separately for emergency purposes. The logic behind this approach is that if you do encounter errors, you’ll have at least one signed and working file for the zone that has a higher serial number than the one currently in use from which to choose. Of course, this assumes that the breakage is not due to the signing mechanism, but some other issue.
A possible improvement to the process of signing multiple copies of a zone is instead to regularly archive versions of the zone file. When an emergency recovery is needed, the last good zone file is identified from the archive; a new SOA and corresponding signature record are prepared (with an updated serial number). The new SOA and signature records replace the old records in the last good zone file, and this updated zone file is made available for distribution.
The efficacy of the rollback option depends on your server infrastructure and whether or not you have a full zone available to distribute. If you control all of your authoritative name servers, you may be able to reload a previous version to them and restore service quite quickly. You will need to make sure that the rollback does not interfere with any scheduled key rollovers. If you do not control your infrastructure, you will probably need to increment the serial number. That will require re-signing at least part of the zone and delaying recovery, to make sure it flows properly and automatically from your point of distribution throughout the third-party infrastructure.
The DNS has long relied upon serial number arithmetic (see RFC 1982). In an emergency, the best option is to use a documented and standardized process, such as that defined in RFC 1982. This ensures you’ll be able to initiate the distribution of your recovered zone file throughout your infrastructure rapidly and without error.
As the old expression goes, there is more than one way to skin a cat. While the DNS and DNSSEC are robust technologies, we are always finding new ways to approach issues with them. This is why I urge all security professionals to participate in discussion groups and to join dedicated organizations like CENTR, the Internet Society and ICANN. In a problem situation, too many choices will almost always be better than none.
* This scenario and potential solutions were the subject of a recent online discussion in a forum hosted by the Council of European National Top Level Domain Registries, better known as CENTR.
Related Reading: Risk vs. Reward of Implementing DNSSEC
Related Reading: Trouble Ahead - The Implementation Challenges for DNSSEC
Related Reading: Deploying DNSSEC - Four Ways to Prepare Your Enterprise for DNSSEC
Related Reading: Five Strategies for Flawless DNSSEC Key Management and Rollover
Related Reading: The Missing Ingredients for DNSSEC Success
Related Reading: Do Recent BGP Anomalies Shed a Light on What's to Come?