Revised script for decrypting datacenter credentials from the Veeam Backup & Replication database

Posted on August 10, 2025 by workinghardinit

Introduction

In a previous article (Protecting your Veeam Backup and Replication Server is critical | Working Hard In IT), I discussed my script for decrypting the datacenter credentials from the Veeam Backup & Replication database. Since then, that PowerShell code has been published dozens of times all over the internet in various articles.

However, three relevant things have changed since my original blog post:

Veeam v12.1 introduced a new encryption method.
Firstly, in Veeam 12.1, the method of encrypting passwords has changed. That means the old script no longer works (always) as it only uses the legacy method.
Veeam published its encryption and decryption methods.
Secondly, Veeam has published the methods used to encrypt and decrypt passwords in the spirit of full disclosure and to preempt anyone who attempts to claim that Veeam is insecure. Those individuals or companies demonstrate only ignorance and malicious intentions. The good news is that the article has all the information we need to write a new script.
Veeam now supports PostgreSQL, in addition to Microsoft SQL Server.
Finally, Veeam now also supports MySQL as a database, in addition to Microsoft SQL. That means we need to ensure that we can retrieve the necessary data from both database types.

Background Info & approach

I based the script on information found in the Veeam KB article “How to Recover Account Credentials From the Veeam Backup & Replication Database” (https://www.veeam.com/kb4349).

Instead of having two scripts, my old one and a newer one. I decided to create one that would work on VBR v12 and lower, as well as on VBR 12.1 and higher.

What Changed in Encryption

Until version 12, Veeam used its internal .NET static method:

[Veeam.Backup.Common.ProtectedStorage]::GetLocalString($encryptedPassword)

That method leverages the native Microsoft Data Protection API (DPAPI) under the hood. It was part of the Veeam.Backup.Common.dll and worked well up to version 12. In v12.1 and beyond, this method no longer exists. Instead, Veeam now leverages the native Microsoft Data Protection API (DPAPI), directly:

[System.Security.Cryptography.ProtectedData]::Unprotect($bytes, $salt, ‘LocalMachine’)

Since both leverage the native Microsoft Data Protection API, I figured I could also use the [System.Security.Cryptography.ProtectedData]::Unprotect static method to decrypt those legacy passwords as long as I don’t try to leverage the optionalEntropy parameter for them. The good news is that in the KB article, Veeam provides instructions on how to differentiate between the legacy and new types of password encryption. That allows me to write logic to determine the version and execute the corresponding decryption method accordingly.

By the way, once you update a password on v12.1 or up, it will be encrypted with the new method. As time passes, by rotating the passwords, legacy encryption phases out.

The new script

I did not want to maintain two separate scripts, one for the legacy password decryption method and one for the newer one. That’s why I’ve consolidated everything into a single, unified PowerShell script. It supports:

Supports VBR v10 through v12.3+ and decrypts Veeam credentials from registry and database.
- The Veeam Backup & Replication encryption salt in the registry lives here: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\Data.
- The Veeam database info in the registry lives here:
  Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\DatabaseConfigurations\
Per-user counters and clean output formatting
Supports MSSQL and PostgreSQL configurations
Handles multiple password formats:
- ‘v12 and lower’
- ‘v12.1 and up (with encryption salt)’
Optional filtering by username
Optional export to file (`Veeam_Credentials.txt` on Desktop)
Graceful error handling and informative console output

The script runs on Windows only, because DPAPI is a Windows-native feature. With VBR v13 introducing Linux-based deployments, this script won’t work in those environments. That’s a different challenge for another day.

Getting the script

You can find the script on GitHub at https://github.com/WorkingHardInIT/Revised-script-for-decrypting-datacenter-credentials-from-the-Veeam-Backup-Replication-database. You will also find the documentation there.

Why do I need this script?

The IT world, like everywhere else, is not a perfect place, and I need a way to deal with imperfection. It is that simple. If we are honest, we all know that IT environments aren’t always in pristine condition. Whether it’s a lab, a forgotten backup server, or an entire backup fabric for a production environment abandoned by a previous IT partner, credentials are often missing. Documentation is sparse. And when disaster strikes, you need access, fast.

My script has already helped IT teams recover access to critical systems when no one else could. I know because I’ve seen it happen. Before Veeam ever published its KB article, my original script was quietly saving the day in real-world scenarios.

Conclusion

Knowledge is power. And while power inherently allows abuse, hiding knowledge under the guise of “security” is just security theater. Security through obscurity is not security but window dressing.

That’s why I’m glad Veeam documented their credential encryption methods. It empowers administrators to recover access responsibly. And it exposes the charlatans who twist transparency into baseless accusations of insecurity. I just felt compelled to create a handy, functional script around it that I can use when needed.

If someone uses this information to claim Veeam is irresponsible, they could not be more wrong. They prove themselves to be untrustworthy. To me, they’ve lost their reputation and credibility.

This script isn’t about hacking. It’s about recovery, accountability, and clarity. And if it helps you regain control of your environment when all else fails, then it’s done its job.

Veeam Hardened Repository ISO: Overview and Requirements

Posted on August 6, 2025 by workinghardinit

Introduction

Readers of my blog and other articles will know that I am a strong advocate of immutable backups, and Veeam delivers this functionality through its Linux Hardened Repository. I have several articles on how to set this up, secure it, add MFA, extend and repair XFS volumes, and more. I have designed and run many successful deployments in production.

In my latest designs, I have introduced a process flow to ensure that backups are not only immutable but also undeletable. The way to do this is to disallow root/sudo access to key personnel who are not involved in daily operations, and who must agree to allow and grant access under the 4-eyes principle. Why? To ensure no one, accidentally or otherwise, makes preventable, bad decisions.

Still, I notice that many people are hesitant to use it, as the perceived complexity of Linux deters them. Veeam has been addressing this perception, which is partially real and partially driven by fear, by providing the Veeam Hardened Repository ISO to simplify deployment and maintenance. Today, we will be looking into that

The Veeam Hardened Repository ISO

The Veeam Hardened Repository ISO (abbreviated to VHRISO on the forums) is a preconfigured, bootable ISO image based on the Rocky Linux distribution, developed and maintained by Veeam. It delivers a Managed Hardened Repository solution designed to simplify deployment and enhance security for backup infrastructures.

This solution caters to the masses to provide better security for all:

Minimize the need for Linux expertise during the setup process.
Provide a hardened operating system with advanced security configurations applied by default.
Ensure secure and compliant backup storage aligned with industry standards.

Security Foundation

The operating system embedded in VHRISO is pre-hardened using guidelines from the Security Technical Implementation Guides (STIGs), maintained by the Defense Information Systems Agency (DISA) for Rocky Linux. All this ensures that even if immutability is enabled, misconfigurations are less likely to compromise the system.

Support Status

As of October 29, 2024, VHRISO transitioned from Community Preview to Experimental Support status. That means that production use is officially supported.

You can open support cases for issues, except those related to the ISO Installer and Configurator Tool, which fall under experimental SLA terms.

Only unmodified versions of VHRISO deployed on compliant hardware are eligible for support.

Veeam announced that it will integrate the standalone ISO into the platform in the next release of Veeam Backup & Replication, V13.

The main points of that announcement are:

Veeam will integrate the standalone ISO into the platform via the new “Just Enough OS” (JeOS) ISO, which will enable deployment of various backup roles, including the hardened repository.
Centralized Updates: JeOS will manage and update the OS and Veeam components across all backup infrastructure roles, simplifying maintenance with automatic patching during scheduled windows.
Easier Provisioning: V13 removes the need for complex passwords in setting up a hardened repository. It will use thumbprint verification and a temporary PIN code for repairing with backup servers.
Host Management Web UI: A new web interface will provide an easy-to-use management tool for JeOS and Veeam settings, with security safeguards to minimize exposure.
Full Support for Managed Repositories: Managed hardened repositories deployed from the V13 JeOS ISO will now be fully supported, moving beyond experimental status.

See Anton Gostev’s announcement here: https://www.linkedin.com/posts/askgostev_weve-been-getting-many-questions-about-our-activity-7312464807171923969-q1YR/

Latest Release

On January 29, 2025, Veeam released Build 2.0.0.8, available via:

You can find it in the Veeam Customer Portal https://www.veeam.com/download_add_packs/vmware-esx-backup/hardened-repository/

Trial Downloads section under: Additional Downloads > Extensions and Other > Veeam Hardened Repository ISO

System Requirements

To ensure compatibility and optimal performance, you must meet the following prerequisites:

Software Requirements

Veeam Backup & Replication version 12.2 or later

Hardware Requirements

You must use hardware from the Red Hat compatibility list or the Certified Quality and Independent organization certified hardware list
Enable UEFI Secure Boot
Do NOT install third-party security software
Only hardware RAID controllers are supported
- Software RAID, Intel VMD VROC, and FakeRAID are not supported
- RAID controllers must have write-back cache enabled
Use internal or direct-attached storage only

Storage Configuration

Minimum of two storage volumes:
- One for the OS (≥100 GB)
- One or more for data (must be larger than OS volume)
The smallest disk must be identifiable (e.g., 100 GB + 101 GB is valid; 2x 100 GB + 1x 200 TB is invalid)
Recommended: Dual-parity RAID configuration

Network Requirements

Standard backup repository ports must be open
You must allow direct or HTTP proxy access to repository.veeam.com on port 443 for:
- OS and security updates
- GPG key renewal (failure to update will require complete OS reinstallation)

Security Best Practices

Secure the Baseboard Management Controller (BMC) port using firewalls and strong credentials
Avoid deploying VHRISO on virtual machines due to:
- Increased attack surface via hypervisor
- Risk of backup inaccessibility during host outages

New Features in Build 2.0.0.8

Repair Mode: Reinstall the OS while preserving data partitions.
Live Boot: Built-in diagnostics and performance testing.
Zero-Touch Installation: Fully automated deployment using Kickstart.
IPv6 DHCP Support: Enhanced connectivity options.
Enhanced Ping Limits: Rate-limited pings for better troubleshooting.
Improved Workflow: Clearer installation steps and safeguards against accidental disk formatting.

Conclusion

The Veeam Hardened Repository ISO aims to provide hardened and immutable repositories in as many deployments as possible. I think they are making progress in achieving this goal. I believe that every Veeam Backup Fabric deployment, whether small or large, should have hardened repositories with immutable backup copies. That is my more recent stance. I used to do it for at least one copy, as that worked out well with refresh projects, but I want to end up with all repositories and backup data copies being immutable and stored on a hardened repository.

I am currently building a lab for the Veeam Hardened Repository ISO to gain experience with it and be well-prepared for the arrival of Veeam Backup & Replication V13. I hope to share some information on that later.

The rejuvenated push for excellence by Veeam for Hyper-V customers

Posted on August 4, 2025 by workinghardinit

Introduction

As an observer of the changes in the hypervisor market in 2024 and 2025, you have undoubtedly noted considerable commotion and dissent in the market. I did not have to deal with it as I adopted and specialized in Hyper-V from day one. Even better, I am pleased to see that many more people now have the opportunity to experience Hyper-V and appreciate its benefits.

While the UI management is not as sleek and is more fragmented than that of some competitors, it offers all the necessary features available for free. Additionally, PowerShell automation enables you to create any tooling you desire, tailored to your specific needs. Do that well, and you do not need System Center Virtual Machine Manager for added capabilities. Denying the technical capabilities and excellence of Hyper-V only diminishes the credibility and standing of those who do so in the community.

That has been my approach for many years, running mission-critical, real-time data-sensitive workloads on Hyper-V clusters. So yes, Microsoft could have managed the tooling experience a bit better, and that would have put them in an even better position to welcome converting customers. Despite that, adoption has been rising significantly over the last 18 months and not just in the SME market.

Commotion, fear, uncertainty, and doubt

The hypervisor world commotion has led to people looking at other hypervisors to support their business, either partially or wholesale. The moment you run workloads on a hypervisor, you must be able to protect, manage, move, and restore these workloads when the need to do so arises. Trust me, no matter how blessed you are, that moment comes to us all. The extent to which you can handle it, on a scale from minimal impact to severe impact, depends on the nature of the issue and your preparedness to address it.

Customers with a more diverse hypervisor landscape means that data protection vendors need to support those hypervisors. I think that most people will recognize that developing high-quality software, managing its lifecycle, and supporting it in the real world requires significant investment. So then comes the question, which ones to support? What percentage of customers will go for hypervisor x versus y or z? I leave that challenge to people like Anton Gostev and his team of experts. What I can say is that Hyper-V has taken a significant leap in adoption, as it is a mature and capable platform built and supported by Microsoft.

The second rise of Hyper-V

Over the past 18 months, I have observed a significant increase in the adoption of Hyper-V. And why not? It is a mature and capable platform built and supported by Microsoft. The latter makes moving to it a less stressful choice as the ecosystem and community are large and well-established. I believe that Hyper-V is one of the primary beneficiaries of the hypervisor turmoil. Adoption is experiencing a second, significant rise. For Veeam, this was not a problem. They have provided excellent Hyper-V support for a long time, and I have been a pleased customer, building some of the best and most performant backup fabrics on our chosen hardware.

But who are those customers adopting Hyper-V? Are they small and medium businesses (SME) or managed service providers? Or is Hyper-V making headway with big corporate enterprises as well? Well, neither Microsoft nor Veeam shares such data with me. So, what do I do? Weak to strong signal intelligence! I observe what companies are doing and what they are saying, in combination with what people ask me directly. That has me convinced that some larger corporations have made the move to Hyper-V. Some of the stronger signals came from Veeam.

Current and future Veeam Releases

Let’s look at the more recent releases of Veeam Backup & Replication. With version 12.3, support for Windows Server 2025 arrived very fast after the general availability of that OS. Hyper-V, by the way, is getting all the improvements and new capabilities for Hyper-V just as much as Azure Local. That indicates Microsoft’s interest in making Hyper-V an excellent option for any customer, regardless of how they choose to run it, be it on local storage, with shared storage, on Storage Spaces Direct (S2D), or Azure Local. That is a strong, positive signal compared to previous statements. Naturally, Hyper-V benefits from Veeam’s ongoing efforts to resolve issues, enhance features, and add capabilities, providing the best possible backup fabric for everyone. I will discuss that in later articles.

Now, the strong signal and very positive signal from Veeam regarding Hyper-V came with updates to Veeam Recovery Orchestrator. Firstly, Veeam Recovery Orchestrator 7.2 (released on February 18th, 2025) introduced support for Hyper-V environments. What does that tell me? The nature, size, and number of customers leveraging Hyper-V that need and are willing to pay for Veeam Recovery Orchestrator have grown to a point where Veeam is willing to invest in developing and supporting it. That is new! On the Product Update page, https://community.veeam.com/product-updates/veeam-recovery-orchestrator-7-2-9827, you can find more information. The one requirement that sticks out is the need for System Center Virtual Machine Manager. Look at these key considerations:

System Center Virtual Machine Manager (SCVMM) 2022 & CSV storage registered in SCVMM is supported.
Direct connections to Hyper-V hosts are not supported.

But not that much later, on July 9^th, 2025, in Veeam Recovery Orchestrator 7.2.1 (see https://community.veeam.com/product-updates/veeam-recovery-orchestrator-7-2-1-10876), we find these significant enhancements:

Support for Azure Local recovery target: You can now use Azure Local as a recovery target for both vSphere and Hyper-V workloads, expanding flexibility and cloud recovery options.
Hyper-V direct-connected cluster support: Extended Hyper-V functionality enables support for direct-connected clusters, eliminating the need for SCVMM. This move simplifies deployment and management for Hyper-V environments.
MFA integration for VRO UI: Multi-Factor Authentication (MFA) can now be enabled to secure logins to the VRO user interface, providing enhanced security and compliance. Microsoft Authenticator and Google Authenticator apps are supported.

Especially 1 and 2 are essential, as they enable Veeam Recovery Orchestrator to support many more Hyper-V customers. Again, this is a strong signal that Hyper-V is making inroads. Enough so for Veeam to invest. Ironically, we have Broadcom to thank for this. Which is why in November 2024, I nominated Broadcom as the clear and unchallenged winner of the “Top Hyper-V Seller Award 2024” (https://www.linkedin.com/posts/didiervanhoye_broadcom-mvpbuzz-hyperv-activity-7257391073910566912-bTTF/)

Conclusion

Hyper-V and Veeam are a potent combination that continues to evolve as market demands change. Twelve years ago, I was testing out Veeam Backup & Replication, and 6 months later, I became a Veeam customer. I am still convinced that for my needs and those of the environments I support, I have made a great choice.

The longevity of the technology, which evolves in response to customer and security needs, is a key factor in determining great technology choices. In that respect, Hyper-V and Veeam have performed exceptionally well, striking multiple bullseye shots without missing a beat. And missing out on the hypervisor drama, we have hit the bullseye once more!

The Perfect Storm of Azure DNS resolver, a custom DNS resolver, and DNS configuration ambiguities

Posted on July 29, 2025 by workinghardinit

TL:DR

The very strict Azure recursive DNS resolver, when combined with a Custom DNS resolver, can cause a timeout-sensitive application to experience service disruption due to ambiguities in third-party DNS NS delegation configurations.

Disclaimer

I am using fantasy FQDNs and made-up IP addresses here. Not the real ones involved in the issue.

Introduction

Services offered by a GIS-driven business noticed a timeout issue. Upon investigation, this was believed to be a DNS issue. That was indeed the case, but not due to a network or DNS infrastructure error, let alone a gross misconfiguration.

The Azure platform DNS resolver (168.63.129.16) is a high-speed and very strict resolver. While it can return the IP information, it does indicate a server error.

nslookup pubdata.coast.be

Server: 127.0.0.11

Address: 127.0.0.11#53

Non-authoritative answer:

pubdata.coast.be canonical name = www.coast.be.

Name: www.coast.be

Address: 154.152.150.211

Name: www.coast.be

Address: 185.183.181.211

** server can’t find www.coast.be: SERVFAIL

Azure handles this by responding fast and reporting the issue. The Custom DNS service, which provides DNS name resolution for the service by forwarding recursive queries to the Azure DNS resolver, also reports the same problem. However, it does not do this as fast as Azure. Here, it takes 8 seconds (Recursive Query Timeout value), potentially 4 seconds longer due to the additional timeout value. So, while DNS works, something is wrong, and the extra time before the timeout occurs causes service issues.

When first asked to help out, my first questions were if it had ever worked and if anything had changed. The next question was whether they had any control over the time-out period to adjust it upward, which would enable the service to function correctly. The latter was not possible or easy, so they came to me for troubleshooting and a potential workaround or fix.

So I dove in with the tools of the trade. nslookup, Nameresolver, Dig, https://dnssec-analyzer.verisignlabs.com/, and https://dnsviz.net/. The usual suspects were DNSSEC and zone delegation mismatches.

First, I run:

Nslookup -debug pubdata.coast.be

In the output, we find:

Non-authoritative answer:

Name: www.coast.be

Addresses: 154.152.150.211

185.183.181.211

Aliases: pubdata.coast.be

We learn that pubdata.coast.be is a CNAME for www.coast.be. Let’s see if any CNAME delegation or DNSSEC issues are in play. Run:

dig +trace pubdata.coast.be

;; global options: +cmd

. 510069 IN NS a.root-servers.net.

. 510069 IN NS b.root-servers.net.

. 510069 IN NS l.root-servers.net.

. 510069 IN NS m.root-servers.net.

. 510069 IN RRSIG NS 8 0 518400 20250807170000 20250725160000 46441 . <RRSIG_DATA_ANONYMIZED>

;; Received 525 bytes from 1.1.1.1#53(1.1.1.1) in 11 ms

be. 172800 IN NS d.nsset.be.

be. 172800 IN NS y.nsset.be.

be. 86400 IN DS 52756 8 2 <DS_HASH_ANONYMIZED>

be. 86400 IN RRSIG DS 8 1 86400 20250808050000 20250726040000 46441 . <RRSIG_DATA_ANONYMIZED>

;; Received 753 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms

coast.be. 86400 IN NS ns1.corpinfra.be.

coast.be. 86400 IN NS ns2.corpinfra.be.

<hash1>.be. 600 IN NSEC3 1 1 0 – <next-hash1> NS SOA RRSIG DNSKEY NSEC3PARAM

<hash1>.be. 600 IN RRSIG NSEC3 8 2 600 20250813002955 20250722120003 62188 be. <RRSIG_DATA_ANONYMIZED>

<hash2>.be. 600 IN NSEC3 1 1 0 – <next-hash2> NS DS RRSIG

<hash2>.be. 600 IN RRSIG NSEC3 8 2 600 20250816062813 20250724154732 62188 be. <RRSIG_DATA_ANONYMIZED>

;; Received 610 bytes from 194.0.37.1#53(b.nsset.be) in 10 ms

pubdata.coast.be. 3600 IN CNAME www.coast.be.

www.coast.be. 3600 IN NS dns-lb1.corpinfra.be.

www.coast.be. 3600 IN NS dns-lb2.corpinfra.be.

;; Received 151 bytes from 185.183.181.135#53(ns1.corpinfra.be) in 12 ms

The DNSSEC configuration is not the issue, as the signatures and DS records appear to be correct. So, the delegation inconsistency is what causes the SERVFAIL, and the duration of the custom DNS servers’ recursive query timeout causes the service issues.

The real trouble is here:

pubdata.coast.be. 3600 IN CNAME www.coast.be

www.coast.be. 3600 IN NS dns-lb1.corpinfra.be.

This means pubdata.coast.be is a CNAME to www.coast.be. But www.coast.be is served by different nameservers than the parent zone (coast.be uses ns1/ns2.corpinfra.be). This creates a delegation inconsistency:

The resolver must follow the CNAME and query a different set of nameservers. If those nameservers don’t respond authoritatively or quickly enough, or if glue records are missing, resolution may fail.

Strict resolvers (such as Azure DNS) may treat this as a lame delegation or a broken chain, even if DNSSEC is technically valid.

Workarounds

I have already mentioned that fixing the issue in the service configuration setting was not on the table, so what else do we have to work with?

A quick workaround is to use the Azure platform DNS resolver (168.63.129.16) directly, which, due to its speed, avoids the additional time required for finalizing the query. However, due to DNS requirements, this workaround is not always an option.
The other one is to reduce the recursive query timeouts and additional timeout values on the custom DNS solution. This is what we did. The timeout value is now 2 (default is 8), and the additional timeout value is now 2 (default is 4). That is what I did to resolve the issue as soon as possible. Monitor this to ensure that no other problems arise after taking this action.

Third, we could conditionally forward coast.be to the dns-lb1.corpinfra.be and dns-lb2.corpinfra.be NS servers. That works, but it requires maintenance when those name servers change, so we need to keep an eye on that. We already have enough work.
A fourth workaround is to provide an IP address from a custom DNS query in the source code to a public DNS server, such as 1.1.1.1 or 8.8.8.8, when accessing the pubdata.coast.be FQDN is involved. This is tedious and not desirable.
The most elegant solution would be to address the DNS configuration Azure has an issue with. That is out of our hands, but it can be requested from the responsible parties. For that purpose, you will find the details of our findings.

Issue Summary

The .be zone delegates coast.be to the NS servers:

dns-lb1.corpinfra.be

dns-lb2.corpinfra.be

However, the coast.be zone itself lists different NS servers:

ns1.corpinfra.be

ns2.corpinfra.be

This discrepancy between the delegation NS records in .be and the authoritative NS records inside the coast.be zone is a violation of DNS consistency rules.

Some DNS resolvers, especially those performing strict DNSSEC and delegation consistency checks, such as Azure Native DNS resolver, interpret this as a misconfiguration and return SERVFAIL errors. This happens even when the IP address(es) for pubdata.coast.be can indeed be resolved.

Other resolvers (e.g., Google Public DNS, Cloudflare) may be more tolerant and return valid answers despite the mismatch, without mentioning any issue.

Why could this be a problem?

DNS relies on consistent delegation to ensure:

Security
Data integrity
Reliable resolution

When delegation NS records and authoritative NS records differ, recursive resolvers become uncertain about the actual authoritative servers.

This uncertainty often triggers a SERVFAIL to avoid possibly returning stale or malicious data. When NS records differ between parent and child zones, resolvers may reject responses to prevent the use of stale or spoofed data.

Overview

Zone Level	NS Records	Notes
`.be` (parent)	`dns-lb1.corpinfra.be`, `dns-lb2.corpinfra.be`	Delegation NS for `coast.be`
`coast.be`	`ns1.corpinfra.be`, `ns2.corpinfra.be`	Authoritative NS for zone

Corpinfra.be (see https://www.dnsbelgium.be/nl/whois/info/corpinfra.be/details) – this is an example, the domain is fictitious – operates all four NS servers that resolve to IPs in the same subnet, but the naming inconsistency causes delegation mismatches.

Recommended Fixes

Option 1: Update coast.be zone NS records to match the delegation NS

Add dns-lb1.corpinfra.be and dns-lb2.corpinfra.be as NS records in the coast.be zone alongside existing ones (ns1 and ns2), so the zone’s NS RRset matches the delegation.

coast.be. IN NS ns1.corpinfra.be.

coast.be. IN NS ns2.corpinfra.be.

coast.be. IN NS dns-lb1.corpinfra.be.

coast.be. IN NS dns-lb2.corpinfra.be.

Option 2: Update .be zone delegation NS records to match the zone’s NS records

Change the delegation NS records in .be zone to use only:

ns1.corpinfra.be

ns2.corpinfra.be

remove dns-lb1.corpinfra.be and dns-lb2.corpinfra.be

Option 3: Align both the .be zone delegation and coast.be NS records to a consistent unified set

Either only use ns1.corpinfra.be abd ns2.corpinfra.be for both the delegation and authoritative zone NS records, or only use dns-lb1.corpinfra.be and dns-lb2.corpinfra.be for both. Or use all of them; three or more geographically dispersed DNS servers are recommended anyway. Depends on who owns and manages the zone.

What to choose?

Option	Description	Pros	Cons
1	Add dns-lb1 and dns-lb2 to the zone file	Quick fix, minimal disruption	Maybe the zones are managed by <> entities
2	Update .be delegation to match zone NS (ns1, ns2)	Clean and consistent	Requires coordination with DNS Belgium
3	Unify both delegation and zone NS records	Most elegant	Requires a full agreement between all parties

All three options are valid, but Option 3 is the most elegant and future-proof. That said, this is a valid configuration as is, and one might argue that Azure’s DNS resolver’s strictness is the cause of the issue. Sure, but in a world where DNSSEC is growing in importance, such strictness might become more common? Additionally, if the service configuration could handle a longer timeout, that would also address this issue. However, that is outside my area of responsibility.

Simulation: Resolver Behavior

Resolver	Behavior with Mismatch	Notes
Azure DNS resolver	`SERVFAIL`	Strict DNSSEC & delegation checks
Google Public DNS	Resolves normally	Tolerant of NS mismatches
Cloudflare DNS	Resolves normally	Ignores delegation inconsistencies
Unbound (default)	May vary	Depends on configuration flags
Bind (strict mode)	`SERVFAIL`	Enforces delegation consistency

Notes

No glue records are needed for coast.be, because the NS records point to a different domain (corpinfra.be), so-called out-of-bailiwick name servers, and .be correctly delegates using standard NS records.
After changes, flush DNS caches

Conclusion

When wading through the RFC we can summarize the findings as below

RFC Summary: Parent vs. Child NS Record Consistency

RFC	Section	Position on NS Matching	Key Takeaway
RFC 1034	§4.2.2	No mandate on matching	Describes resolver traversal and authoritative zones, not strict delegation consistency
RFC 1034	§6.1 & §6.2	No strict matching rule	Discusses glue records and zone cuts, but doesn’t say they must be aligned
RFC 2181	§5.4.1	Explicit: child may differ	Parents’ NS records are not authoritative for the child; the child can define its own set.
RFC 4035	§2.3	DNSSEC implications	Mismatched NS sets can cause issues with DNSSEC validation if not carefully managed.
RFC 7719	Glossary	Reinforces delegation logic	Clarifies that delegation does not imply complete control or authority over the child zone

In a nutshell, RFC 2181 Section 5.4.1 is explicit: the NS records in a parent zone are authoritative only for that parent, not for the child. That means the child zone can legally publish entirely different NS records, and the RFC allows it. So, why is there an issue with some DNS resolvers, such as Azure?

Azure DNS “Soft” Enforces Parent-Child NS Matching

Azure DNS resolvers implement strict DNS validation behavior, which aligns with principles of security, reliability, and operational best practice, not just the letter of the RFC. This is a soft enforcement; the name resolution does not fail.

Why

1. Defense Against Misconfigurations and Spoofing

Mismatched NS records can indicate stale or hijacked delegations.

Azure treats mismatches as potential risks, especially in DNSSEC-enabled zones, and returns SERVFAIL to warn about potential spoofed responses, but does not fail the name resolution.

2. DNSSEC Integrity

DNSSEC depends on a trusted chain of delegation.

If the parent refers to NS records that don’t align with the signed child zone, validation can’t proceed.

Azure prioritizes integrity over leniency, which is why there is stricter enforcement.

3. Predictable Behavior for Enterprise Networks

In large infrastructures (like hybrid networks or private resolvers), predictable resolution is critical.

Azure’s strict policy ensures that DNS resolution failures are intentional and traceable, not silent or inconsistent like in looser implementations.

4. Internal Resolver Design

Azure resolvers often rely on cached referral points.

When those referrals don’t match authoritative data at the zone apex, Azure assumes the delegation is unreliable or misconfigured and aborts resolution.

Post Mortem summary

Azure DNS resolvers enforce delegation consistency by returning a SERVFAIL error when parent-child NS records mismatch, thereby signaling resolution failure rather than silently continuing or aborting. While RFC 2181 §5.4.1 allows child zones to publish different NS sets than the parent, Azure chooses to explicitly flag inconsistencies to uphold DNSSEC integrity and minimize misconfiguration risks. This deliberate error response enhances reliability in enterprise environments, ensuring resolution failures are visible, traceable, and consistent with secure design principles.

This was a perfect storm. A too-tight timeout setting in the service (which I do not control), combined with the Azure DNS resolvers’ rigorous behavior, which is fronted by a custom DNS solution required to serve all possible DNS needs in the environment, results in longer times for recursive DNS resolution that finally tripped up the calling service.