Google's new Gemini AI beats OpenAI's GPT-4 and humans at 57 subjects

0 3

By Matthew Griffin Intelligence and the Senses 12th December 2023

WHY THIS MATTERS IN BRIEF

We are seeing rapid improvements in AI capability, may of which are now starting to outperform human experts in numerous fields.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Google has unveiled its awesome next-gen Gemini Artificial Intelligence (AI), claiming it outperforms OpenAI’s GPT-4 – as well as human experts – on nearly all major tests. It understands images, video and audio as well as text and code, and will gain other senses over time.

With a score of 90.0% on the MMLU (Massive Multitask Language Understanding) test, it’s the first model to outperform human experts (89.8%), as well as GPT-4 (86.4%) in a range of knowledge and problem solving tasks across a range of 57 subjects including math, physics, history, law, medicine and ethics. That’s experts, not the average human.

The Future of AI, by Keynote Matthew Griffin

Gemini is multimodal from the ground up – meaning that its original training data set contained a ton of other media in addition to text. Thus, you could say it’s as fluent in visual and auditory “understanding” as it is with text. Where other language models have tended to “think” in textual terms when looking at video and images, Gemini retains all the tone and nuance of the original video, audio and image sources.

While the video below is a slick product demo, and thus should be taken with a large grain of salt, it’s worth watching to give you a sense of what this multimodality really means.

What’s the upshot here? Well, AIs are being trained with wider and wider sensory datasets, to mimic the processes by which humans learn to interact with the world. With next-level visual and auditory understanding, Gemini’s perception and reasoning take a step forward. Once this thing lands in Google devices – beginning with the next Pixel phones – it’ll be able to help with all sorts of daily tasks.

And as Google Deepmind CEO Demis Hassabis told Wired, this will soon extend into the next logical sensory realm: touch and tactile feedback. Google is already a major player in AI robotics with their Everyday robotics projects, but embedding a super-knowledgeable model like Gemini with the ability to understand the world through touch will take robotics – humanoid and otherwise – into uncharted territory.

Multimodality is far from the only banner feature here, but as with GPT-4, Gemini is such an anything machine that it’s hard to know where to start. Perhaps with the contributions it could make to science? In the video below, Deepmind scientists demonstrate how Gemini is able to generate its own code to read and interpret 200,000 scientific studies, filtering them for relevance using its own reasoning capabilities, and then collate data and effectively create new meta-knowledge. The team says it did this all over their lunch break, and that it’ll be relevant to other fields like law in which huge datasets need to be examined.

Speaking of coding, Gemini is fluent in Python, Java, C++ and Go programming. Indeed, Google is already showing off how it can create websites that dynamically code themselves as you use them, in response to what you seem to want from them. This feels like a whole new approach to the internet; you go to a single page that grows into what you need as soon as it figures out what that is.

The demo video here uses a pretty lightweight use case: planning a kid’s birthday party. But you can see the extraordinary power it encapsulates, and imagine how it might create graphical user interfaces – a kind of what I’ll call here a Generative User Interface – for nearly any task you could imagine. This is the sort of thing only AI can do; it’s like having a web app programmer sitting right next to you, but capable of working hundreds of times faster to create and adapt the UI’s you’re using in real time according to your actions and needs.

And as with any AI tool, it’s super interactive; if it’s not giving you exactly what you want, you can just tell it, and it’ll adjust itself to fit your desires, or engage in a conversation about the best way to proceed. Stunning stuff, and a glimpse into how our interactions with technology are fundamentally shifting.

On the topic of coding, Deepmind has done some other interesting work with Gemini in a project called AlphaCode 2, which takes several different Gemini models and trains them specifically in different parts of the programming process.

In essence, AlphaCode 2 creates a swarm of programming agents, and gets them to generate up to a million different chunks of code to solve a problem. It then uses a separate Gemini model to examine these code samples, check if they compile, and rank them on how well they do their portion of the overall coding work, discarding around 95% of the samples created.

Then, another Gemini model develops a code-testing regime and sample test data, and runs a thorough testing process on all the remaining code samples, ranking them on “correctness,” to find the top pieces of code. Effectively, Deepmind has split Gemini into a multifunctional software team, with specialist AIs working on requirements analysis, system design, testing, deployment and maintenance as well as a giant army of coders.

How does it perform? Well, in a coding competition against humans, it beat 87% of other entrants, ranking it “just between the ‘Expert’ and ‘Candidate Master’ categories on Codeforces.”

As Deepmind scientists explain in the video below, these kind of contests require a ton more than just coding skills – they require extraordinary degrees of rational understanding and creative use of the available software tools.

Mind you, AlphaCode 2 isn’t going to be available to the public immediately, or indeed ever in its current form. Generating a million code snippets, as you might imagine, burns a ton of computing power and is way too expensive for general release. But what’s interesting here is that the success rate doesn’t appear to have tapered off at a million snippets – indeed, it seems that AlphaCode would continue to improve its results if it went well into the billions, or trillions. That’s an incredibly inefficient way to do things, but with the blinding speed of progress in this area, a smarter way is sure to come along very soon.

Deepmind says it’s looking at how a streamlined version can be brought into the public models.

And there’s more, a ton more. But this should give you a sense of what Google is promising here. Google is planning to release it in three model sizes: Gemini Nano, built for installation right on board mobile devices, Gemini Pro – a rough equivalent of GPT 3.5, which will be the main workhorse model for most tasks, and Gemini Ultra, the largest model, which Google says beats GPT-4 handily across a broad swathe of benchmark tests – gapping it even more substantially on multimodal testing than on text-based challenges.

Gemini Ultra is scheduled for public launch next year, once it’s been more thoroughly vetted for safety and alignment issues. That’s when we’ll start getting a proper sense for where it outshines GPT and where it’s just not up to snuff. Gemini Nano, meanwhile, is already available on the Pixel 8 Pro smartphone, and will begin rolling out on others.

Gemini Pro, though, is available right now, for free, to anyone with a Google account through the Google Bard service. It’s a slimmed-down version, unfortunately, with only the ability to upload images rather than documents, audio or video, but Google says it’ll gain new capabilities soon. It’s already got access, with your permission, to operate on your Gmail, Google Drive and Google Docs, as well as flight and hotel bookings, Google Maps, and YouTube, where it allows you to interact and ask questions about videos.

And yep, Google is working to integrate the Gemini model into pretty much every product it makes. Buckle up, y’all, this roller coaster only knows how to accelerate.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.