Learn/ Docs/ Modern/ Cdn And Load Balancing

modern

DNS for CDNs and Load Balancing

How CDNs use DNS to steer traffic to the nearest edge server — GeoDNS, anycast, health checks, and multi-CDN orchestration

DNS is the CDN’s steering wheel

Before a CDN delivers a single byte of content, DNS decides which edge server delivers it. Every CDN-accelerated domain delegates its DNS to the CDN’s authoritative servers, which return different IP addresses depending on where the user is, how healthy each edge node is, and what routing policy the operator has configured.

This makes DNS the primary decision engine for global content delivery. The CDN’s authoritative DNS server does not just translate names to addresses — it selects the optimal address from hundreds or thousands of possibilities.

Traffic steering methods

CDNs combine several DNS-based techniques to route users to the best available server:

MethodHow it worksBest for
GeoDNSMaps the resolver’s IP (or client subnet via ECS) to a geographic location, returns the nearest PoPRegion-specific content, data sovereignty
Latency-based routingMeasures real-time latency between users and PoPs, returns the lowest-latency endpointPerformance-critical applications
Weighted routingDistributes a defined percentage of traffic across multiple endpointsGradual migrations, cost optimization
Health-aware routingContinuous probes remove unhealthy endpoints from DNS responsesHigh availability, automatic failover

Most large CDNs use a hybrid approach: anycast for baseline routing efficiency, with DNS-based steering as an additional intelligent control layer. Anycast handles the network-level routing while DNS provides application-aware, policy-driven decisions.

Anycast DNS

Anycast is a routing methodology where a single IP address is advertised from multiple locations simultaneously via BGP. When a client sends a packet to an anycast address, the routing system delivers it to the topologically nearest instance.

For DNS, this means:

  • A CDN announces the same IP prefix from hundreds of global PoPs
  • BGP routing directs each query to the closest available server
  • If a PoP goes down, BGP withdraws the route and traffic shifts to the next-nearest PoP within seconds
  • Clients use a single IP address globally — Cloudflare’s 1.1.1.1, Google’s 8.8.8.8

Anycast provides natural DDoS resilience because attack traffic is distributed across all PoPs rather than concentrated at a single location. It also provides sub-minute failover through BGP route withdrawal, without requiring any DNS TTL expiration.

How major CDNs use DNS

Cloudflare

Cloudflare operates a fully flat, anycast-based network across 330+ cities in 120+ countries. Every edge location runs the full stack — no mid-tier caches. The same IP addresses are announced from all PoPs. 95% of the world’s internet-connected population is within 50ms of a Cloudflare server.

Akamai

The largest traditional CDN, with 365,000+ servers across 135+ countries and 1,000+ PoPs. Akamai’s Edge DNS offers a 100% uptime SLA and uses both GeoDNS-based intelligent routing and anycast. Points of presence are distributed across multiple networks within each geography for both geographic and network-level redundancy.

AWS CloudFront

Amazon’s CDN operates 750+ PoPs in 100+ cities, plus 1,140+ embedded PoPs and 15 Regional Edge Caches. DNS is the primary steering mechanism — Route 53 routes queries to the CloudFront PoP that can best serve each request, typically based on latency.

Fastly

Fastly runs fewer but more powerful PoPs (100+ globally) with 462 Tbps of total network capacity. The design philosophy is larger caches and faster response times per location, with 150ms global purge capability for real-time content delivery.

DNS-based load balancing

DNS load balancing distributes traffic by returning different IP addresses in responses:

Round-robin DNS rotates through a list of addresses for each query. Simple to configure but lacks health awareness — if a server goes down, DNS continues returning its address until the record is manually updated.

Weighted round-robin assigns weights to endpoints based on capacity. A server with weight 3 receives three times the traffic of a server with weight 1.

Latency-aware routing returns the endpoint with the lowest measured latency, using synthetic probes or real user monitoring data to make decisions.

Limitations

DNS load balancing has inherent constraints. TTLs mean changes propagate slowly — typically 30-300 seconds. Client-side and intermediate DNS caching can delay failover. And the resolver’s IP is used for geolocation, not the actual end user’s IP. EDNS Client Subnet (RFC 7871) partially mitigates this by forwarding a truncated client IP to the authoritative server.

Global Server Load Balancing (GSLB)

GSLB extends DNS load balancing to multi-data-center and multi-cloud environments. It operates at the DNS layer, using dynamic algorithms and health checks to steer traffic across geographically distributed pools.

Active-Active: All sites serve traffic simultaneously. GSLB distributes based on proximity, load, or policy.

Active-Passive: Standby sites activate only when primary sites fail health checks.

Failover with failback: Traffic automatically returns to recovered sites once they pass health checks, preventing permanent traffic shifts after transient failures.

Health checking

DNS-based failover requires continuous endpoint monitoring:

ProtocolMethodTypical interval
HTTP/HTTPSGET/HEAD request, validate status code and body10-30 seconds
TCPConnection attempt to specified port10-30 seconds
ICMPPing check10-30 seconds
CustomApplication-specific checks (database connectivity, queue depth)Varies

Lower TTLs (30-60 seconds) enable faster failover but increase DNS query volume and cost. Most providers recommend TTLs of 60-300 seconds for a balance of responsiveness and efficiency.

Multi-CDN DNS strategies

Organizations increasingly use multiple CDN providers for resilience, performance, and cost optimization. DNS is the control plane for multi-CDN architectures.

StrategyDescription
Performance-basedRoute to the fastest CDN based on real user monitoring or synthetic probes
Geographic splitDifferent CDNs for different regions (Akamai for APAC, CloudFront for Americas)
Cost optimizationRoute to cheaper CDN during low-traffic periods, premium CDN during peaks
Availability failoverPrimary CDN with automatic failover to backup on health check failure

Real User Monitoring feeding DNS

The most sophisticated multi-CDN deployments use Real User Monitoring (RUM) to continuously feed performance data back into DNS decisions. A JavaScript tag on the website collects latency, throughput, and error rates from real users. This data feeds the DNS traffic management platform, which dynamically adjusts responses — if CDN-A degrades in Asia while CDN-B performs well, DNS shifts Asian traffic to CDN-B in real time.

Leading multi-CDN orchestration platforms include NS1 (IBM) with its Filter Chain technology, Citrix Intelligent Traffic Management, and Constellix (DigiCert).

DNS-based deployments

DNS can route traffic between different deployments for blue/green releases and canary testing.

Weighted routing configures DNS to return endpoint A for 90% of queries and endpoint B for 10%, enabling gradual rollouts. AWS Route 53 natively supports percentage-based traffic distribution.

Blue/green deployments cut over between two complete environments by changing A or CNAME records. The TTL determines how quickly traffic shifts — lower TTLs mean faster cutover but more DNS queries.

DNS-based traffic splitting is imprecise compared to application-layer methods. DNS caching means users may not see the new resolution for the TTL duration, and there is no per-user stickiness. For precise A/B testing, application-layer splitting (service mesh, load balancer rules) is preferred. DNS is better suited for infrastructure-level routing decisions.

The CDN-DNS dependency

The October 2016 Dyn attack demonstrated the fragility of this architecture. When Dyn’s DNS infrastructure went down under a Mirai botnet DDoS attack, every service that depended on Dyn for DNS-based CDN steering went offline — Twitter, Netflix, Spotify, Reddit, and dozens of others. The attack did not touch any CDN edge server or origin server. It only disrupted the DNS layer that directed traffic to those servers.

This single incident drove widespread adoption of multi-provider DNS strategies. If DNS is the CDN’s steering wheel, a DNS outage means no one can find the road — even if every destination server is running perfectly.