Learn/ Docs/ Performance/ Optimization

performance

DNS Optimization Techniques

TTL tuning, serve-stale, aggressive NSEC caching, CNAME flattening, and other techniques that make DNS faster and more resilient

Making DNS faster without changing the protocol

Most DNS optimization comes down to two strategies: avoid querying upstream (serve from cache) and reduce the cost when you must query upstream (fewer round trips). The techniques below are used at scale by resolvers, authoritative servers, and network operators.

TTL tuning: the fundamental tradeoff

TTL (Time to Live) determines how long a DNS record can be served from cache. Choosing TTL values is a tradeoff between freshness and performance:

TTL valueCache impactUse case
30–60 secondsNear-zero caching benefitActive failover, DDoS mitigation
300 seconds (5 min)Modest cachingCDN load balancing, dynamic infrastructure
3,600 seconds (1 hour)Good caching; standard for most servicesGeneral web services, email MX records
86,400 seconds (24 hours)Excellent cachingStable infrastructure, NS delegations
604,800 seconds (7 days)Maximum caching; slow propagationRoot zone hints, very stable records

Real-world TTL distribution

A study of the Alexa Top 1 Million websites found:

  • Mean TTL across all records: 9,780 seconds (~2.7 hours)
  • Median TTL: 255 seconds (~4 minutes)
  • For close to half of the top 1M sites, at least one domain has a TTL at or below 60 seconds

The gap between mean and median reveals a heavily skewed distribution: most records use short TTLs, but some use very long ones.

CDNs are the primary driver of short TTLs:

CDN/ServiceTypical A record TTLReason
Akamai20 secondsRapid failover and load balancing
Cloudflare300 seconds (Auto TTL)Balance between performance and flexibility
Amazon CloudFront60 secondsFast failover
Fastly30 secondsDynamic traffic steering

CDNs deliberately accept the cache penalty in exchange for operational flexibility — the ability to shift traffic instantly when a data center fails or load spikes.

Serve-stale: resilience through expired records

RFC 8767 (“Serving Stale Data to Improve DNS Resiliency”) defines a mechanism for resolvers to continue serving expired DNS records while simultaneously refreshing them in the background:

  1. A cached record’s TTL expires
  2. Instead of returning SERVFAIL or blocking, the resolver immediately returns the stale record
  3. In the background, the resolver queries the authoritative server for a fresh answer
  4. The stale response is served with a TTL of 30 seconds (recommended)
  5. Records are retained in cache for 1–3 days beyond TTL expiry

Why serve-stale matters

BenefitDescription
Resilience to outagesIf authoritative servers are unreachable (DDoS, misconfiguration), users still get answers
Reduced perceived latencyUsers never wait for cold-cache resolution — they always get an instant response
DDoS mitigationMakes DNS-targeted DDoS less effective, reducing attacker motivation

Implementation

SoftwareSupportConfiguration
BIND 9Since 9.12stale-answer-enable yes; stale-answer-ttl 30;
UnboundSince 1.12serve-expired: yes
Knot ResolverYesBuilt-in
Google Public DNSYesEnabled by default
Cloudflare 1.1.1.1YesEnabled by default

Serve-stale is arguably the single most impactful resilience feature in modern DNS. It converts what would be a hard outage (SERVFAIL) into a graceful degradation (slightly stale but functional response).

Aggressive NSEC caching (RFC 8198)

RFC 8198 enables validating resolvers to use cached NSEC/NSEC3 records to synthesize NXDOMAIN responses without querying the authoritative server.

How it works: When a resolver validates and caches NSEC records that prove a range of names does not exist, it can immediately answer NXDOMAIN for any name within that proven range — no upstream query needed.

Benefits:

  • Reduces authoritative server load — fewer queries for non-existent domains
  • Mitigates random subdomain attacks (PRSD) — attackers send queries like abc123.example.com; NSEC proves they do not exist without upstream queries
  • Lowers latency for NXDOMAIN — instant response from cache versus full resolution chain

Unbound implements this as aggressive-nsec: yes, enabled by default in recent versions.

NXDOMAIN cut (RFC 8020)

RFC 8020 states that when a resolver receives NXDOMAIN for a domain, all names at or below that domain should be treated as non-existent.

If foo.bar.example.com returns NXDOMAIN, the resolver can also answer NXDOMAIN for baz.foo.bar.example.com without querying upstream. The entire subtree is pruned.

Combined with RFC 8198 (aggressive NSEC), NXDOMAIN cut provides maximum cache leverage for negative responses — particularly effective against random subdomain attacks.

QNAME minimization (RFC 9156)

Traditional DNS resolution sends the full query name to every server in the delegation chain. When resolving www.secret-project.example.com:

  • The root server sees the complete name
  • The .com TLD sees the complete name
  • Only example.com authoritative needs to see it

QNAME minimization changes this behavior — the resolver sends only the minimum labels needed at each step:

  • Root server sees query for .com
  • .com TLD sees query for example.com
  • Only example.com authoritative sees www.secret-project.example.com

The primary benefit is privacy — upstream servers see less query data. The performance impact is typically negligible, and it integrates well with NXDOMAIN cut: if example.com returns NXDOMAIN at the TLD level, no further queries are needed for anything below it.

CNAME flattening

CNAME chains multiply resolution latency — each hop adds a round trip. CNAME flattening eliminates this by having the authoritative server resolve the chain itself and return the final IP address directly.

This is particularly important at the zone apex (e.g., example.com without www), where RFC-compliant DNS does not allow CNAME records alongside NS and SOA records. Providers like Cloudflare, Route 53 (ALIAS records), and DNSimple implement flattening as a proprietary extension that presents an A record to the client while maintaining a CNAME internally.

The result: a multi-hop CNAME chain that would add 60–200 ms of resolution latency is resolved in a single response.

Negative caching

Negative caching (RFC 2308) stores NXDOMAIN and NODATA responses so repeated queries for non-existent domains are answered from cache. The TTL for negative responses is the minimum of the SOA MINIMUM field and the SOA record’s own TTL.

RFC 9520 (2023) extended this to also cover resolution failures (SERVFAIL, timeouts), recommending caching these for short durations (5–30 seconds) to prevent thundering herd problems during upstream failures. Without negative failure caching, every client retrying simultaneously can overwhelm an already struggling authoritative server.

Prefetching

Major public resolvers proactively refresh popular records before they expire:

  • Google Public DNS: Prefetches high-traffic domains, ensuring the cache is always warm
  • Cloudflare: Similar prefetching for records approaching TTL expiry
  • Unbound: Supports prefetch: yes — when a cached record is queried and its TTL is within 10% of expiry, Unbound resolves it in the background

Prefetching eliminates the cold-cache penalty for popular domains entirely. The tradeoff is additional upstream query volume, but for records that would be queried again within seconds anyway, the cost is negligible.

Anycast: the infrastructure optimization

All techniques above optimize the software. Anycast optimizes the network.

By advertising the same IP address from hundreds of locations worldwide, anycast ensures DNS queries reach the physically nearest server. For a resolver like Cloudflare 1.1.1.1 with 330+ cities, this means:

  • Client-to-resolver RTT is minimized (often 1–5 ms)
  • DDoS traffic is distributed across all instances
  • If an instance fails, BGP routing shifts traffic to the next nearest instance within seconds

Anycast is the reason root servers handle 130+ billion queries per day across ~1,900 instances without breaking a sweat. It is the foundational technique that makes global DNS performance possible.

The optimization stack

These techniques are not alternatives — they are layers that compound:

LayerTechniqueWhat it saves
NetworkAnycastClient-to-resolver RTT
CacheTTL tuning, prefetchingUpstream queries for popular domains
CacheServe-staleAvailability during outages
CacheAggressive NSEC, NXDOMAIN cutUpstream queries for non-existent domains
ResolutionCNAME flatteningMulti-hop resolution chains
ResolutionQNAME minimizationPrivacy leak and unnecessary queries
Clientdns-prefetch, preconnectVisible DNS latency in browsers

A well-optimized DNS deployment uses all of these simultaneously. The result is a system where the vast majority of queries are answered in under 5 ms from cache, failures are handled gracefully through stale data, and the remaining cold-cache queries traverse the shortest possible network path through an anycast-optimized resolver.