A Systematic Approach to Debugging DNS Issues
When DNS Goes Wrong
DNS failures are uniquely frustrating because they manifest as something else. A website that won’t load, an email that never arrives, an API that times out — the symptom rarely points directly at DNS. Developing a systematic approach saves hours of debugging the wrong layer.
The key insight: DNS is a chain, and you can test each link independently. The resolution path goes: client cache, local resolver, recursive resolver, root servers, TLD servers, authoritative servers. A failure at any point breaks everything downstream.
Step 1: Confirm It’s Actually DNS
Before diving into DNS debugging, verify that DNS resolution is the problem:
Can you reach the IP directly? If you know the server’s IP address, try connecting to it. If the IP works but the domain doesn’t, DNS is your problem.
Does the domain resolve at all? Use the DNS Lookup tool to query the domain. If you get a response with an IP address, DNS is working and the problem is elsewhere (firewall, server down, TLS certificate).
Is it universal or local? Try the Propagation Checker to see if the domain resolves from multiple public resolvers. If it works from Google (8.8.8.8) but not your local network, the issue is with your ISP’s resolver or local configuration, not the domain’s DNS.
Step 2: Check the Response Code
DNS responses include a status code that tells you exactly what happened:
| Response Code | Meaning | Action |
|---|---|---|
| NOERROR | Query succeeded (even if zero records returned) | Check the answer section for actual records |
| NXDOMAIN | Domain does not exist | Verify domain spelling, check if registration has expired |
| SERVFAIL | Server failed to process query | Usually a DNSSEC validation failure or misconfigured nameserver |
| REFUSED | Server refused to answer | You’re querying a server that isn’t authoritative and doesn’t do recursion |
| TIMEOUT | No response received | Network issue, firewall blocking port 53, or nameserver is down |
SERVFAIL deserves special attention. When a validating resolver encounters broken DNSSEC signatures, it returns SERVFAIL rather than risking serving forged data. If you see SERVFAIL from one resolver but the domain works from a non-validating resolver, DNSSEC misconfiguration is the likely cause. The DNSSEC Validator can pinpoint where the chain of trust breaks.
Step 3: Walk the Chain
For persistent issues, manually walk the DNS resolution chain from root to authoritative:
Query the root servers
Start by querying a root server for the TLD nameservers. This confirms that the root zone knows about the TLD.
Query the TLD nameservers
Ask the TLD nameserver for the domain’s NS records. This confirms that the delegation is correct. If the NS records point to nameservers that no longer exist, you’ve found the problem.
Query the authoritative nameservers
Query each authoritative nameserver listed in the NS records. All of them should return the same data. If one returns different results or doesn’t respond, you have a consistency issue that will cause intermittent failures.
Step 4: Check for Common Failure Modes
Expired Domain Registration
The most embarrassing DNS failure: the domain registration expired. Check using the WHOIS Lookup — if the expiry date has passed, the registrar may have suspended the domain or it may be in redemption period.
Lame Delegation
A “lame delegation” occurs when the NS records at the parent zone point to servers that don’t actually serve the zone. This happens when you change DNS providers but forget to update the NS records at your registrar.
Missing Glue Records
If your nameservers are within the zone they serve (e.g., ns1.example.com serving example.com), the parent zone needs “glue records” — A records for the nameservers embedded in the delegation. Without glue, resolvers hit a circular dependency: they need to resolve ns1.example.com to find the nameservers for example.com, but they need to find the nameservers for example.com to resolve ns1.example.com.
DNSSEC Signature Expiry
DNSSEC signatures have an expiration date. If your zone signing process fails and signatures expire, validating resolvers will return SERVFAIL for every query. This is a silent failure — everything works until the signatures expire, then everything breaks at once.
TTL-Related Staleness
After a DNS change, different resolvers will see different results based on their cache state. This causes the “works for me” debugging pattern: the domain resolves correctly from your network (cache expired) but fails from a colleague’s network (cache still holds old data). The Propagation Checker shows the state across multiple resolvers simultaneously.
Step 5: Test the Fix
After identifying and fixing the issue:
- Verify at the authoritative server — Confirm the fix is live on your nameservers
- Check propagation — Use the Propagation Checker to confirm resolvers are picking up the change
- Test end-to-end — Verify the actual service (website, email, API) works correctly
- Monitor — Watch for recurrence, especially for DNSSEC and expiration-related issues
The Debugging Toolkit
For systematic DNS debugging, you need these capabilities:
- Record lookup — Query specific record types for a domain: DNS Lookup
- Multi-resolver check — See if the issue is universal or resolver-specific: Propagation Checker
- DNSSEC validation — Trace the chain of trust for signature issues: DNSSEC Validator
- Domain health — Comprehensive check of records, email auth, and DNSSEC: DNS Health Check
- Registration status — Check expiration and registrar info: WHOIS Lookup
Further Reading
- DNS Response Codes — Complete reference for all DNS response codes
- DNS Monitoring — Setting up proactive DNS monitoring
- DNSSEC — Understanding DNSSEC validation and failure modes
- DNS Software — dig, drill, and other command-line DNS tools