
Google says Gemini has been designed to be multimodal from the ground up.
Share Post
Google says Gemini has been designed to be multimodal from the ground up.
Google has released its Gemini Foundation model, aiming to compete with ChatGPT and its latest avatar, GPT-4, which has made waves in the last year. Gemini will be available in three variants:
Gemini Ultra: Which Google claims is superior to GPT-4 and will launch soon.
Gemini Pro: Which has been added to the Google Bard chatbot, offering enhanced capabilities.
Gemini Nano: Which is meant for its Pixel smartphones for on-device processing.
Gemini Pro has already been added to Bard, and Gemini Nano is also launching today on the Pixel 8 Pro smartphone as a part of Google's Android feature drop update for the device, which adds a ton of more functionality.
Google says Gemini has been designed to be multimodal from the ground up, but as of the time of writing this article, Bard, which has now been updated with the Gemini Pro model, doesn't have full multimodal functionality. Google says it will integrate this technology across Chrome, Ads, and has already been testing it in the generative search preview.
“This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company,” Google CEO Sundar Pichai said in a statement. “I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere.”
Google says early next year Bard will get the Gemini Ultra model as part of a new tier called Bard Advanced, which presumably will be paid. Google shared a long list of benchmarks in which, save for one, in every instance, Gemini Ultra slightly outclassed GPT-4. It also shared a demonstration of Bard’s new capabilities with a collaboration with Mark Rober, in which he uses the AI to build him an accurate paper airplane which can go through a ring of fire dodging the low pressure zone created by the fire.
Gemini Nano will power features like the summaries feature on the Android recorder app and enable the smart reply feature on the Pixel 8 Pro. Google says AI will also power the Gboard in WhatsApp – this feature will be scaled to other messaging apps and other elements of the OS next year.
“With a score of over 90%, Gemini is the first A.I. model to outperform human experts on the industry standard benchmark MMLU,” said Eli Collins, Vice President of Product at Google DeepMind. “It’s our largest and most capable A.I. model.” MMLU, short for Massive Multitask Language Understanding, measures AI capabilities using standard tests in a combination of 57 subjects such as math, physics, history, law, medicine, and ethics.
Interestingly, this announcement comes at a time where a report emerged that Google CEO had delayed the launch of Gemini thanks to it not handling non-English queries properly. That likely is Gemini Ultra, which will come next year and Google DeepMind’s engineers are still fine-tuning it.
Gemini also has some really advanced capabilities. For example, in one video, Google demonstrates Bard helping a student with his physics homework, starting with a photo of the assignment with handwritten questions. The AI model can easily transition to written advice with the complete equations coupled with step-by-step answers, making it almost like an AI-based tutor.
This capability is somewhat of progress towards AGI. OpenAI has made similar progress, as reported by Reuters, with a program called Q* that can solve similar math questions.
This is big as LLMs aren’t designed to be able to solve advanced math queries, but when the AI starts to consistently handle math queries correctly, that’s indicative of advanced reasoning capabilities, something Google showed off proudly.
However, “we have per the presentation made a lot of progress on multimodal reasoning as well as advanced reasoning in mathematics,” added Collins.
Google also announced AlphaCode 2, a new code generation tool that leverages the capabilities of the Gemini model. It can code in Python, Java, C++, and Go and, according to Google, performed 85% better than competitors in a subset of programming competitions hosted on CodeForces. This is a massive leap from AlphaCode, which has a result of less than 50%.
“We selected 12 recent contests with more than 8,000 participants, either from division 2 or the harder division ‘1+2.’ This makes for a total of 77 problems,” a technical whitepaper on AlphaCode 2 reads. “AlphaCode 2 solves 43% of problems within 10 attempts, close to twice as many problems as the original AlphaCode (25%).”
AlphaCode 2 can understand complex math and theoretical computer science problems. It is also capable of dynamic programming, something Google DeepMind researcher sowed off. Dynamic programming enables the model to simplify a complex problem by breaking it down to easier sub problems over and over.
“AlphaCode 2 needs to show some level of understanding, some level of reasoning and designing of code solutions before it can get to the actual implementation to solve [a] coding problem,” Leblond said. “And it does all that on problems it’s never seen before.”
KTM Halts Production In Austria, Again!
Sutanu Guha 25 Apr, 2025, 8:32 AM IST
Govt. Mulling Incentives To Boost Electric Truck Adoption
Krishna SinhaChaudhury 25 Apr, 2025, 7:24 AM IST
2025 Royal Enfield Hunter 350 Launch Tomorrow: What To Expect?
Sutanu Guha 25 Apr, 2025, 7:21 AM IST
Stellantis to Launch Leapmotor EV Brand in India
Pratik Rakshit 25 Apr, 2025, 7:05 AM IST
VinFast Targets June Launch for $2 Billion EV Plant in Tamil Nadu
Pratik Rakshit 25 Apr, 2025, 6:25 AM IST
Looking for a new car?
We promise the best car deals and earliest delivery!