Google launches Gemini 2.0 with autonomous tool linking
Google is embracing “agentic experiences” in the rollout of Gemini 2.0, its new flagship family of generative AI expected to compete with OpenAI o1, GitHub Copilot, and Amazon Nova along with ChatGPT.
The tech giant released the first model, Gemini 2.0 Flash, on December 11 to global developers through the Gemini API in Google AI Studio and Vertex AI. With limited testing starting next week, consumers can expect Gemini 2.0 to impact Google Search and AI observations. A public rollout is scheduled for early 2025.
Through Gemini 2.0, developers can access multimodal input and text output, while Early Access partners can test text-to-speech and native image generation. The Gemini app will be updated with Gemini 2.0 flash “soon.” Google said in a press release,
General availability, and additional model sizes such as the base model Gemini 2.0, are expected in January.
What is Gemini 2.0?
Contents
Gemini 2.0 is a multimodal generative AI model that runs on Google’s Trillium hardware. It’s designed to make online tasks easier and more intuitive by helping you summarize information, conduct web searches, and even interact with tools or apps more naturally.
Google notes that Gemini 2.0 Flash is twice as fast as its predecessor 1.5 Pro, and it surpasses it in AI performance benchmarks such as MMLU-PRO and LiveCodeBench.
“If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it more useful,” Google CEO Sundar Pichai said in a statement.
What sets Gemini 2.0 apart is its agentic capabilities. Pichai described these capabilities as enabling the model to “understand more about the world around you, think several steps ahead, and take action on your behalf with its monitoring.”
Google further emphasized that Gemini 2.0 differentiates itself:
- Multimodal Processing.
- Ability to understand long books or wide areas of the web.
- Function calling.
- “Using basic equipment.”
- “Following and planning complex instructions.”
The use of native tools allows AI to incorporate tools like Google search and code execution to perform autonomous tasks. In practice, it sometimes looks like Google’s Project Astra – an Android app currently in testing that uses the phone’s camera and Gemini’s logic to answer questions about the world in real time. Project Astra can analyze up to 10 minutes of video at a time.
Google also announces additional projects, prototypes
project mariner
Another proof of concept is Project Mariner, an experimental Chrome extension that demonstrates Google’s attempt to enable Gemini to read browser screens. Users can ask it to summarize web pages or make purchases.
“It’s still early, but Project Mariner shows that navigating within a browser is becoming technically possible, even if it’s not always accurate and is slow to complete tasks today, which takes time. “, said Demis Hassabis, CEO of Google DeepMind and Koray Kavcuoglu, CTO of Google DeepMind, wrote in the press release.
WATCH: Google also revealed special image and video generation AI models in early December.
deep research
Deep Research, available with the Gemini Advanced subscription, is an experimental model connected to the web. It is designed for graduate students, scientists or entrepreneurs to create research plans and outlines. The tool searches the web for a topic of your choice, submits a research plan for approval or changes, and then analyzes the existing body of work.
Jules Developer Assistant
Google also announced a new developer tool called Jules, a coding assistant powered by Gemini 2.0 Flash. Jules sits within GitHub and can write code, fix bugs, and create and execute multi-step plans. Jules is available to a limited group of testers today. Google expects to expand availability as early as 2025.
Google is preparing to deal with cyber threats
Google also noted that it is aware that Project Mariner, in particular, can be a rich hunting ground for quick injection attacks. The company said it is working on setting up protections against phishing and fraud attempts, where attackers can hide AI instructions in emails, websites or documents.
(TagstoTranslate)Agent AI(T)Amazon(T)Artificial Intelligence(T)Coding Assistant(T)Developer Tools(T)Gemini 2.0 Flash(T)Generative AI(T)Google(T)Google Gemini(T)Microsoft(T) )openai
#Google #launches #Gemini #autonomous #tool #linking