performance
DNS Optimization Techniques
TTL tuning, serve-stale, aggressive NSEC caching, CNAME flattening, and other techniques that make DNS faster and more resilient
Making DNS faster without changing the protocol
Most DNS optimization comes down to two strategies: avoid querying upstream (serve from cache) and reduce the cost when you must query upstream (fewer round trips). The techniques below are used at scale by resolvers, authoritative servers, and network operators.
TTL tuning: the fundamental tradeoff
TTL (Time to Live) determines how long a DNS record can be served from cache. Choosing TTL values is a tradeoff between freshness and performance:
| TTL value | Cache impact | Use case |
|---|---|---|
| 30–60 seconds | Near-zero caching benefit | Active failover, DDoS mitigation |
| 300 seconds (5 min) | Modest caching | CDN load balancing, dynamic infrastructure |
| 3,600 seconds (1 hour) | Good caching; standard for most services | General web services, email MX records |
| 86,400 seconds (24 hours) | Excellent caching | Stable infrastructure, NS delegations |
| 604,800 seconds (7 days) | Maximum caching; slow propagation | Root zone hints, very stable records |
Real-world TTL distribution
A study of the Alexa Top 1 Million websites found:
- Mean TTL across all records: 9,780 seconds (~2.7 hours)
- Median TTL: 255 seconds (~4 minutes)
- For close to half of the top 1M sites, at least one domain has a TTL at or below 60 seconds
The gap between mean and median reveals a heavily skewed distribution: most records use short TTLs, but some use very long ones.
CDNs are the primary driver of short TTLs:
| CDN/Service | Typical A record TTL | Reason |
|---|---|---|
| Akamai | 20 seconds | Rapid failover and load balancing |
| Cloudflare | 300 seconds (Auto TTL) | Balance between performance and flexibility |
| Amazon CloudFront | 60 seconds | Fast failover |
| Fastly | 30 seconds | Dynamic traffic steering |
CDNs deliberately accept the cache penalty in exchange for operational flexibility — the ability to shift traffic instantly when a data center fails or load spikes.
Serve-stale: resilience through expired records
RFC 8767 (“Serving Stale Data to Improve DNS Resiliency”) defines a mechanism for resolvers to continue serving expired DNS records while simultaneously refreshing them in the background:
- A cached record’s TTL expires
- Instead of returning SERVFAIL or blocking, the resolver immediately returns the stale record
- In the background, the resolver queries the authoritative server for a fresh answer
- The stale response is served with a TTL of 30 seconds (recommended)
- Records are retained in cache for 1–3 days beyond TTL expiry
Why serve-stale matters
| Benefit | Description |
|---|---|
| Resilience to outages | If authoritative servers are unreachable (DDoS, misconfiguration), users still get answers |
| Reduced perceived latency | Users never wait for cold-cache resolution — they always get an instant response |
| DDoS mitigation | Makes DNS-targeted DDoS less effective, reducing attacker motivation |
Implementation
| Software | Support | Configuration |
|---|---|---|
| BIND 9 | Since 9.12 | stale-answer-enable yes; stale-answer-ttl 30; |
| Unbound | Since 1.12 | serve-expired: yes |
| Knot Resolver | Yes | Built-in |
| Google Public DNS | Yes | Enabled by default |
| Cloudflare 1.1.1.1 | Yes | Enabled by default |
Serve-stale is arguably the single most impactful resilience feature in modern DNS. It converts what would be a hard outage (SERVFAIL) into a graceful degradation (slightly stale but functional response).
Aggressive NSEC caching (RFC 8198)
RFC 8198 enables validating resolvers to use cached NSEC/NSEC3 records to synthesize NXDOMAIN responses without querying the authoritative server.
How it works: When a resolver validates and caches NSEC records that prove a range of names does not exist, it can immediately answer NXDOMAIN for any name within that proven range — no upstream query needed.
Benefits:
- Reduces authoritative server load — fewer queries for non-existent domains
- Mitigates random subdomain attacks (PRSD) — attackers send queries like
abc123.example.com; NSEC proves they do not exist without upstream queries - Lowers latency for NXDOMAIN — instant response from cache versus full resolution chain
Unbound implements this as aggressive-nsec: yes, enabled by default in recent versions.
NXDOMAIN cut (RFC 8020)
RFC 8020 states that when a resolver receives NXDOMAIN for a domain, all names at or below that domain should be treated as non-existent.
If foo.bar.example.com returns NXDOMAIN, the resolver can also answer NXDOMAIN for baz.foo.bar.example.com without querying upstream. The entire subtree is pruned.
Combined with RFC 8198 (aggressive NSEC), NXDOMAIN cut provides maximum cache leverage for negative responses — particularly effective against random subdomain attacks.
QNAME minimization (RFC 9156)
Traditional DNS resolution sends the full query name to every server in the delegation chain. When resolving www.secret-project.example.com:
- The root server sees the complete name
- The
.comTLD sees the complete name - Only
example.comauthoritative needs to see it
QNAME minimization changes this behavior — the resolver sends only the minimum labels needed at each step:
- Root server sees query for
.com .comTLD sees query forexample.com- Only
example.comauthoritative seeswww.secret-project.example.com
The primary benefit is privacy — upstream servers see less query data. The performance impact is typically negligible, and it integrates well with NXDOMAIN cut: if example.com returns NXDOMAIN at the TLD level, no further queries are needed for anything below it.
CNAME flattening
CNAME chains multiply resolution latency — each hop adds a round trip. CNAME flattening eliminates this by having the authoritative server resolve the chain itself and return the final IP address directly.
This is particularly important at the zone apex (e.g., example.com without www), where RFC-compliant DNS does not allow CNAME records alongside NS and SOA records. Providers like Cloudflare, Route 53 (ALIAS records), and DNSimple implement flattening as a proprietary extension that presents an A record to the client while maintaining a CNAME internally.
The result: a multi-hop CNAME chain that would add 60–200 ms of resolution latency is resolved in a single response.
Negative caching
Negative caching (RFC 2308) stores NXDOMAIN and NODATA responses so repeated queries for non-existent domains are answered from cache. The TTL for negative responses is the minimum of the SOA MINIMUM field and the SOA record’s own TTL.
RFC 9520 (2023) extended this to also cover resolution failures (SERVFAIL, timeouts), recommending caching these for short durations (5–30 seconds) to prevent thundering herd problems during upstream failures. Without negative failure caching, every client retrying simultaneously can overwhelm an already struggling authoritative server.
Prefetching
Major public resolvers proactively refresh popular records before they expire:
- Google Public DNS: Prefetches high-traffic domains, ensuring the cache is always warm
- Cloudflare: Similar prefetching for records approaching TTL expiry
- Unbound: Supports
prefetch: yes— when a cached record is queried and its TTL is within 10% of expiry, Unbound resolves it in the background
Prefetching eliminates the cold-cache penalty for popular domains entirely. The tradeoff is additional upstream query volume, but for records that would be queried again within seconds anyway, the cost is negligible.
Anycast: the infrastructure optimization
All techniques above optimize the software. Anycast optimizes the network.
By advertising the same IP address from hundreds of locations worldwide, anycast ensures DNS queries reach the physically nearest server. For a resolver like Cloudflare 1.1.1.1 with 330+ cities, this means:
- Client-to-resolver RTT is minimized (often 1–5 ms)
- DDoS traffic is distributed across all instances
- If an instance fails, BGP routing shifts traffic to the next nearest instance within seconds
Anycast is the reason root servers handle 130+ billion queries per day across ~1,900 instances without breaking a sweat. It is the foundational technique that makes global DNS performance possible.
The optimization stack
These techniques are not alternatives — they are layers that compound:
| Layer | Technique | What it saves |
|---|---|---|
| Network | Anycast | Client-to-resolver RTT |
| Cache | TTL tuning, prefetching | Upstream queries for popular domains |
| Cache | Serve-stale | Availability during outages |
| Cache | Aggressive NSEC, NXDOMAIN cut | Upstream queries for non-existent domains |
| Resolution | CNAME flattening | Multi-hop resolution chains |
| Resolution | QNAME minimization | Privacy leak and unnecessary queries |
| Client | dns-prefetch, preconnect | Visible DNS latency in browsers |
A well-optimized DNS deployment uses all of these simultaneously. The result is a system where the vast majority of queries are answered in under 5 ms from cache, failures are handled gracefully through stale data, and the remaining cold-cache queries traverse the shortest possible network path through an anycast-optimized resolver.