1. Kubernetes is a popular platform for running AI workloads like training and large language model serving, including the new open model Gemma.
2. Google Kubernetes Engine (GKE) in Autopilot mode provides a fully managed Kubernetes platform that offers the power and flexibility of Kubernetes without the need to worry about compute nodes, allowing you to focus on delivering your own business value through AI.
3. Today, GKE in Autopilot mode introduces the new Accelerator compute class for improved GPU support with resource reservation capabilities and lower pricing for most GPU workloads, as well as a new Performance compute class for high-performance workloads to run at scale.
Kubernetes is a popular platform for running AI workloads like training and large language model serving, including the new Gemma model. Google Kubernetes Engine (GKE) in Autopilot mode offers a fully managed Kubernetes platform that eliminates the need to worry about compute nodes, allowing users to focus on delivering business value through AI. The new Accelerator compute class in Autopilot improves GPU support with resource reservation capabilities and lower prices for most GPU workloads. The Performance compute class enables high-performance workloads to run on Autopilot at scale, with more available ephemeral storage on the boot disk.
Running GKE in Autopilot mode eliminates the need to specify and provision nodes upfront, allowing users to focus on building workloads and creating business value. Today’s announcement includes lower prices for the majority of GPU workloads in Autopilot mode, as well as a new billing model that improves compatibility with other Google Cloud products. Users can move workloads between Standard and Autopilot modes of GKE, as well as between Compute Engine VMs, while retaining existing Reservations and committed use discounts.
The new pricing model for GPU workloads in GKE Autopilot mode is opt-in, with automatic migration of workloads to the new model planned for the future. Prices for most workloads are expected to decrease, with some slight increases for workloads on NVIDIA T4 GPUs with less than 2 vCPU per GPU. Users can opt in by upgrading to version 1.28.6-gke.1095000 or later and adding the compute-class selector to existing GPU workloads.