Overview of Llama 3.2
Just two months after the last release, Meta has introduced Llama 3.2. This model is its first open-source and multi-modal AI capable of processing text, images, tables, charts, and image captions.
Advanced AI Applications
The new Llama 3.2 allows developers to create advanced AI applications. For instance, they can build virtual reality apps, visual search engines, and document analysis tools. Moreover, it can handle both text and image data simultaneously. This flexibility enables developers to interact easily with visual files.
Keeping Pace with Competitors
In response to multi-modal models from companies like OpenAI and Google, Meta aims to remain competitive. The addition of image processing is crucial for future developments, especially for hardware like the Meta Ray-Ban smart glasses.
Model Variants
Llama 3.2 comes in two vision versions (with 11 and 90 billion parameters) and two text versions (with 1 and 3 billion parameters). The smaller versions are designed to work with Qualcomm, MediaTek, and other ARM-based devices. Therefore, Meta may also introduce these models to smartphones.
Competitive Performance
Meta asserts that Llama 3.2 competes strongly in image recognition. It performs well against models like Claude 3 Haiku from Anthropic and GPT4o-mini from OpenAI. Additionally, it outperforms models like Gemma and Phi 3.5-mini in following instructions, summarizing content, and rewriting prompts.
Availability
Currently, these models are available on the Llama.com website and through Meta’s partner platforms like Hugging Face.