1. Customers like Osmos are using JetStream for LLM inference workloads, harnessing Google Cloud AI infrastructure for data transformation and automation.
2. JetStream offers a powerful, cost-efficient, open-source foundation for LLM inference, accelerating the development of AI applications in natural language processing.
3. MaxDiffusion provides high-performance diffusion model inference for computer vision, with implementations in JAX for scalability and customization, achieving impressive throughput on Cloud TPUs.
Osmos, a company specializing in AI-powered data transformation, has found success in using JetStream to accelerate their LLM inference workloads. By utilizing Google Cloud’s Cloud TPU v5e with MaxText, JAX, and JetStream, Osmos was able to efficiently process and transform messy incoming data from customers and business partners, achieving results within hours rather than days. CEO Kirat Pandya highlights the importance of high-performance, scalable AI infrastructure for their end-to-end AI workflows.
JetStream provides researchers and developers with a cost-efficient, open-source foundation for LLM inference, unlocking new possibilities in natural language processing. The platform aims to accelerate AI practitioners’ journey and encourage beginners to explore the potential of LLMs. Interested individuals can visit the GitHub repository to learn more about JetStream and begin their LLM projects, with ongoing support and development provided by Google Cloud Customer Care.
In addition to LLMs, MaxDiffusion offers high-performance diffusion model inference for computer vision applications. This collection of open-source diffusion-model reference implementations, written in JAX, provides core components such as cross attention, convolutions, and image data loading. MaxDiffusion is adaptable and customizable, catering to both researchers and developers looking to integrate cutting-edge AI capabilities into their applications. The implementation of the new SDXL-Lightning model boasts impressive performance, achieving 6 images/s on Cloud TPU v5e-4 with scalability up to 12 images/s on Cloud TPU v5e-8.