Google’s Gemini 1.5 Pro can now hear

Illustration: The Verge

Google’s update to Gemini 1.5 Pro gives the model ears. The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.

During its Google Next event, Google also announced it’ll make Gemini 1.5 Pro available to the public for the first time, through its platform to build AI applications, Vertex AI. Gemini 1.5 Pro was first announced in February.

This new version of Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, already surpasses the biggest and most powerful model, Gemini Ultra, in performance. Gemini 1.5 Pro can understand complicated instructions and eliminates the need to fine-tune models, Google claims.

Gemini 1.5 Pro is not available to people without access to Vertex AI. Right now, most people encounter Gemini language models through the Gemini chatbot. Gemini Ultra powers the Gemini Advanced chatbot, and while it is powerful and also able to understand long commands, it’s not as fast as Gemini 1.5 Pro.

Gemini 1.5 Pro is not the only large AI model from Google getting an update. Imagen 2, the text-to-image generation model that helps power Gemini’s image-generation capabilities, will also add inpainting and outpainting which lets users add or remove elements from images. Google also made its digital watermarking feature SynthID available on all pictures created through Imagen models. SynthID adds an invisible to the viewer watermark on images that marks its provenance when viewed through a detection tool.

Many of the new features of Imagen, especially inpainting and outpainting, have been part of other text-to-image models like Stability AI’s Stable Cascade and Getty’s Generative AI by iStock, not to mention wider consumer availability on newer Samsung Galaxy phones.

Google says it’s also publicly previewing a way to ground its AI responses with Google Search so they answer with up to date information. That’s not always a given with the responses produced by large language models, sometimes intentionally; Google has intentionally kept Gemini from answering questions related to the 2024 US election.

Gemini was also recently criticized for generating photos with historically inaccurate people.

Recent Articles

Related Stories