Google has been working on the development of the Gemini large language model (LLM) for the past eight months and just recently provided access to its early versions to a small group of companies. This LLM is believed to be giving head-to-head competition to other LLMs like Meta’s Llama 2 and OpenAI’s GPT-4.
The AI model is designed to operate on various formats, be it text, image or video, making the feature one of the most significant algorithms in Google’s history.
In a blog post, Google CEO Sundar Pichai wrote, “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company.”
The new LLM, also known as a multimodal model, is capable of various methods of input, like audio, video, and images. Traditionally, multimodal model creation involves training discrete parts for several modalities and then piecing them together.
“These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning,” Pichai said. “We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effe
[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.