Elon Musk’s Grok 1.5 Vision MultiModel Is Here

Shakthi Warnakualsuriya
4 min readApr 21, 2024

--

Get ready for a game-changer in the world of Large Language Models (LLMs)! xAI, a team with impressive AI expertise, has unveiled Grok 1.5, a groundbreaking multimodal LLM that’s not only powerful but also completely open-source.

What is Grok 1.5?

Introducing Grok-1.5V, our first-generation multimodal model. In addition to its strong text capabilities, Grok can now process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs. Grok-1.5V will be available soon to our early testers and existing Grok users.

Think of Grok 1.5 as a super-powered language model that can not only understand and generate text like its predecessors but can also handle visual information! Images, charts, diagrams — you name it, Grok 1.5 can process it. This multimodal capability allows Grok 1.5 to tackle tasks that were previously only limited to closed-source LLMs.

Open Source Advantage: Why it Matters

Here’s where Grok 1.5 truly shines — it’s completely open-source! This means the underlying code and the base model itself are freely available for anyone to access and tinker with. This is a major departure from the likes of OpenAI’s GPT-4 and Google’s Gemini Pro.

Open-source LLMs like Grok 1.5 are crucial for the future due to several reasons:

  • Faster Innovation: Researchers, developers, and even hobbyists can now experiment with and contribute to the development of cutting-edge AI. This fosters collaboration, accelerates innovation, and pushes the boundaries of what LLMs can achieve.
  • Transparency and Reproducibility: Open-source models allow researchers to understand how the model works and replicate its results, fostering trust and collaboration.
  • Accessibility: Open-source models democratize AI by making these powerful tools accessible to anyone with technical expertise, not just large corporations.

Grok 1.5 vs. the Competition

Early benchmarks suggest that Grok 1.5 can compete toe-to-toe with other advanced LLMs such as GPT-4V, Claude 3 Sonnet, Claude 3 Opus and Gemini Pro 1.5. In fact, Grok is outperforming these advanced models in some benchmarks like understanding the physical world (RealWorldQA). With its multimodal capabilities and open-source nature, Grok 1.5 has the potential to disrupt the LLM landscape and usher in a new era of accessible and powerful AI.

“Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs. We are particularly excited about Grok’s capabilities in understanding our physical world. Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding. For all datasets below, we evaluate Grok in a zero-shot setting without chain-of-thought prompting.”

Examples from the xAI Website

All the screenshots have been taken from the xAI website.

Final Words

This is just the beginning for Grok 1.5. With its open-source foundation and impressive capabilities, it has the potential to become a major player in the LLM race. Stay tuned to see what amazing things this new model can accomplish!

Finally, if you are a premium X user, you have access to the Grok 1 model. So, if you have worked with Grok share your experience in the comment section and also if there's anything you like to know about LLMs and AI drop those in the comments too.

--

--

Shakthi Warnakualsuriya
Shakthi Warnakualsuriya

Written by Shakthi Warnakualsuriya

DevOps Engineering Intern @IFS | GitHub Campus Expert | Computer Science Undergraduate at the University of Westminster | AI and ML enthusiast

No responses yet