1. Microsoft announced a new version of its small language AI model, Phi-3, called Phi-3-vision, capable of analyzing images and identifying objects.
2. Phi-3-vision is a multimodal model that can read both text and images, designed for use on mobile devices with a 4.2 billion-parameter model.
3. Phi-3-vision can perform general visual reasoning tasks like analyzing charts and images, and it is available in preview along with other Phi-3 models on Azure Machine Learning with a paid Azure account.
Microsoft announced a new version of its small language AI model, Phi-3, called Phi-3-vision during Build 2024. This new multimodal model is designed to analyze images and tell users what’s in them. Unlike more well-known models like OpenAI’s DALL-E, Phi-3-vision can only “read” an image and cannot generate images. The model features 4.2 billion parameters and is designed for use on mobile devices.
Phi-3-vision is part of a series of small AI models released by Microsoft, which are meant to run locally on a wider range of devices without an internet connection. These models also reduce the computing power needed for certain tasks, such as solving math problems with Microsoft’s Orca-Math model. Phi-3-vision can perform general visual reasoning tasks like analyzing charts and images.
The first iteration of Phi-3, Phi-3-mini, was released in April and performed well in benchmark tests against larger models like Meta’s Llama 2. There are also Phi-3-small and Phi-3-medium models with 7 billion and 14 billion parameters, respectively. Phi-3-vision is currently available in preview, while the other Phi-3 models can be accessed through the Azure Machine Learning model catalog with a paid Azure account and Azure AI Studio hub.