AI21 Labs’ latest AI model exhibits exceptional contextual comprehension

by

in

– AI industry moving towards generative AI models with longer contexts
– AI21 Labs releasing Jamba, a generative model with larger context window
– Jamba uses a combination of transformers and state space models to improve efficiency and throughput

The AI industry is shifting towards generative AI models with larger context windows, but these models are typically computationally intensive. Or Dagan, a product lead at AI startup AI21 Labs, believes this doesn’t have to be the case and the company is releasing a generative model called Jamba to prove it. Context windows refer to the input data a model considers before generating output, with larger context windows allowing for better understanding and flow of data.

Jamba, trained on a mix of public and proprietary data, can generate text in multiple languages and handle up to 140,000 tokens while running on a single GPU with at least 80GB of memory. This capability allows Jamba to generate text equivalent to around 105,000 words or 210 pages, making it suitable for various tasks such as writing and analyzing text. In contrast, Meta’s Llama 2 has a smaller 32,000-token context window but requires less GPU memory to run.

What sets Jamba apart is its unique combination of two model architectures: transformers and state space models (SSMs). Transformers are known for their complex reasoning tasks and attention mechanism, while SSMs offer a more computationally efficient architecture for handling long sequences of data. By incorporating Mamba, an open source SSM model, Jamba delivers three times the throughput on long contexts compared to transformer-based models of similar sizes.

Although Jamba has been released under an open source license, it is not intended for commercial use as it lacks safeguards against generating toxic text or addressing potential bias. However, Dagan believes that Jamba showcases the potential of the SSM architecture and anticipates further improvements in performance with additional tweaks to the model.

Source link