Google Cloud expands AI infrastructure with sixth-generation TPU
Google Cloud will enhance AI cloud infrastructure with new TPUs and NVIDIA GPUs, the tech company announced at its App Day & Infrastructure Summit on October 30.
Now in preview for cloud customers, the sixth generation of Trillium NPUs power many of Google Cloud’s most popular services, including Search and Maps.
“Through these advancements in AI infrastructure, Google Cloud empowers businesses and researchers to redefine the boundaries of AI innovation,” Mark Lohmeyer, VP and GM of Compute and AI Infrastructure at Google Cloud, wrote in a Press release“We look forward to the transformative new AI applications that will emerge from this powerful foundation.”
Trillium NPU accelerates generative AI processes
Contents
As larger language models grow, so does the need for silicon to support them.
The sixth generation of Trillium NPUs provides training, inference, and delivery of large language model applications at up to 91 exaflops in a TPU cluster. Google Cloud reports that the sixth generation version offers a 4.7x increase in peak compute performance per chip compared to the fifth generation. This doubles the high-bandwidth memory capacity and interchip interconnect bandwidth.
Trillium meets the high computation demands of large-scale diffusion models such as Stable Diffusion XL. At its peak, the Trillium infrastructure can connect thousands of chips, creating what Google Cloud describes as a “building-scale supercomputer.”
Enterprise customers are demanding more cost-effective AI acceleration and increased inference performance, Mohan Pichika, group product manager of AI infrastructure at Google Cloud, said in an email to TechRepublic.
In Press releaseGoogle Cloud customer Deniz Tuna, development lead at mobile app development company HubX, said: “We used Trillium TPU for text-to-image creation with MaxDiffusion and Flux.1 and the results are amazing! We were able to generate four images in 7 seconds – that’s a 35% improvement in response latency and a ~45% reduction in cost/image compared to our current system!”
New virtual machines anticipate NVIDIA Blackwell chip delivery
In November, Google will add A3 Ultra VMs powered by NVIDIA H200 Tensor Core GPUs to its cloud services. A3 Ultra VMs run AI or high-power computing workloads on Google Cloud’s data center-wide network at up to 3.2 Tbps GPU-to-GPU traffic. They also offer customers:
- Integration with NVIDIA ConnectX-7 hardware.
- GPU-to-GPU networking bandwidth is 2x that of the previous benchmark A3 Mega.
- Up to 2x higher LLM inference performance.
- Almost double the memory capacity.
- 1.4x more memory bandwidth.
The new VMs will be available through Google Cloud or Google Kubernetes Engine.
WATCH: Blackwell GPUs are sold out for the next year, Nvidia CEO Jensen Huang said at an investors meeting in October.
Additional Google Cloud Infrastructure Updates Support the Growing Enterprise LLM Industry
Naturally, Google Cloud’s infrastructure offerings are interconnected. For example, the A3 Mega is backed by the Jupiter data center network, which will soon see its own AI-workload-focused enhancements.
With its new network adapter, Titanium’s host offload capability can now more effectively adapt to the diverse demands of AI workloads. The Titanium ML Network Adapter uses NVIDIA ConnectX-7 hardware and Google Cloud’s data-center-wide 4-way rail-aligned network to deliver up to 3.2 Tbps GPU-to-GPU traffic. The benefits of this combination flow to Jupiter, Google Cloud’s optical circuit switching network fabric.
Another key element of Google Cloud’s AI infrastructure is the processing power required for AI training and inference. Bringing together a large number of AI accelerators is a hypercompute cluster, which includes A3 Ultra VMs. Hypercompute clusters can be configured via API calls, it leverages reference libraries like JAX or PyTorch, and supports open AI models like Gemma2 and Llama3 for benchmarking.
Google Cloud customers can access hypercompute clusters with A3 Ultra VMs and Titanium ML network adapters in November.
These products address enterprise customers’ requests for optimized GPU utilization and simplified access to high-performance AI infrastructure, Pichika said.
“Hypercompute clusters provide an easy-to-use solution for enterprises to leverage the power of AI hypercomputers for large-scale AI training and inference,” he said by email.
Google Cloud is also preparing racks for NVIDIA’s upcoming Blackwell GB200 NVL72 GPUs, which are expected to be adopted by hyperscalers in early 2025. Once available, these GPUs will join Google’s Xeon-processor-based VM series, taking advantage of Google’s custom Arm processors.
Pichika declined to directly comment on whether the timing of the Hypercompute cluster or Titanium ML was linked to the delay in delivery of the Blackwell GPUs: “We’re excited to continue working together to provide customers the best of both technologies.” Are.”
Two more services, the HyperDisk ML AI/ML-centric block storage service and the ParallelStore AI/HPC-centric parallel file system, are now generally available.
Google Cloud services can be accessed through multiple means international sector,
Google Cloud’s competitors for AI hosting
Google Cloud competes primarily with Amazon Web Services and Microsoft Azure in cloud hosting of large language models. Alibaba, IBM, Oracle, VMware and other big language modelers offer similar stables of resources, though not always on the same scale.
according to statistaGoogle Cloud had a 10% share of the worldwide cloud infrastructure services market in the first quarter of 2024. Amazon AWS’s share was 34% and Microsoft Azure’s share was 25%.
(TagstoTranslate)Google(T)Google Cloud(T)Hypercompute Cluster(T)Large Language Model(T)Neural Network(T)Nvidia(T)Stable Propagation(T)Titanium(T)Trillium
#Google #Cloud #expands #infrastructure #sixthgeneration #TPU