Azure Expands AI Superfactory with Atlanta Datacenter

cloud
  • Fairwater design pushes boundaries of compute and networking

Microsoft has unveiled its latest Azure AI datacenter in Atlanta, Georgia, marking the next step in its Fairwater initiative. Connected to the first Fairwater site in Wisconsin and integrated into Azure’s global footprint, the new facility is designed to meet unprecedented demand for AI computing. By packing GPUs more densely than ever before, the datacenter aims to support frontier model training and diverse workloads. The project reflects decades of experience in datacenter design and lessons learned from some of the largest AI training jobs worldwide.

Maximizing Compute Density

Modern AI workloads are increasingly constrained by physical limits such as latency and energy efficiency. Fairwater addresses these challenges with advanced cooling systems, including a closed-loop liquid cooling approach that reuses water continuously for over six years. This system enables racks to reach power levels of 140kW and rows up to 1,360kW, maximizing compute density. A two-story building design further reduces cable lengths, improving latency, bandwidth, and reliability across interconnected GPUs.

Cooling innovations allow large-scale training jobs to run efficiently at high utilization. Heat is dissipated through one of the largest chiller plants globally, ensuring sustainable operations. Every GPU is connected to every other GPU, creating a tightly integrated environment for demanding workloads. These design choices highlight Microsoft’s focus on balancing performance with sustainability.

Reliable Power and Advanced Networking

The Atlanta site was chosen for its resilient utility power, capable of achieving high availability at reduced cost. By relying on grid stability, Microsoft avoids traditional redundancy measures such as on-site generation or dual-corded distribution. Partnerships with industry players have produced power-management solutions to mitigate oscillations from large-scale AI jobs. Supplementary workloads, GPU-enforced thresholds, and on-site energy storage help maintain grid stability.

Networking innovations are central to Fairwater’s design. Each rack houses up to 72 NVIDIA Blackwell GPUs connected via NVLink, offering ultra-low latency and pooled memory access. Scale-out networking links racks into clusters with 800 Gbps GPU-to-GPU connectivity, creating a single supercomputer across hundreds of thousands of GPUs. Microsoft’s SONiC operating system supports commodity hardware, reducing costs and avoiding vendor lock-in. Optimizations in packet handling and telemetry ensure reliable, low-latency performance.

Planet-Scale Integration

Even with dense compute and advanced networking, single facilities cannot meet the demands of trillion-parameter models. To address this, Microsoft has built a dedicated AI WAN optical network, adding over 120,000 miles of fiber across the U.S. This backbone connects multiple generations of supercomputers into a unified AI superfactory. Developers can allocate workloads dynamically across sites, maximizing flexibility and utilization.

The AI WAN represents a departure from traditional approaches where all traffic relied on scale-out networks. By segmenting workloads more granularly, Microsoft enables fit-for-purpose networking tailored to specific requirements. Integration across geographically diverse sites expands reach and resilience. Together, these innovations form the foundation of Azure’s planet-scale AI infrastructure.

Microsoft’s SONiC networking software, originally developed for Azure datacenters, has become an open-source project adopted by major cloud providers and enterprises worldwide. Its role in Fairwater highlights how open-source innovation can underpin some of the largest AI systems ever built.


 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.