Recent advances in AI are terrifying. Hardly a week goes by without a new algorithm, application, or implication making headlines. But OpenAI, the source of much of the hype, only recently finished its flagship algorithm, GPT-4, and according to OpenAI CEO Sam Altman, its successor, GPT-5, hasn’t started training yet.
It’s possible the pace will slow down in the coming months, but don’t bet on it. A new AI model that is as powerful as GPT-4 or even more powerful could come out sooner or later.
This week in an interview with Google DeepMind CEO Demis Hassabis, Will Knight, said their next big model, Gemini, is currently in development, “a process that will take several months.” Hassabis said Gemini will be a mashup based on the biggest AI hits, most notably DeepMind’s AlphaGo, which used reinforcement learning to overthrow a Go champion in 2016, years before experts expected the feat.
“At a high level, you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the big models,” Hassabis said Wired. “We also have some new innovations that will be very interesting.” All in all, the new algorithm should be better at planning and problem-solving, he said.
The era of AI fusion
Many of the recent advances in AI have come from ever larger algorithms consuming more and more data. As engineers increased the number of internal connections—or parameters—and began training them on Internet-scale datasets, model quality and capability increased like clockwork. As long as a team had the money to buy chips and access data, progress was almost automatic because the structure of the algorithms, called transformers, didn’t have to change significantly.
Then, in April, Altman said the age of big AI models was over. Training costs and computing power had skyrocketed while gains from scaling flattened out. “We’re going to improve them in other ways,” he said, but didn’t elaborate on what those other ways would look like.
GPT-4 and now Gemini provide hints.
Last month at Google’s I/O developer conference, CEO Sundar Pichai announced that work on Gemini was underway. He said the company built it “from the ground up” to be multimodal – that is, trained to and capable of fusing multiple types of data like images and text – and for API integrations (think you to plugins). Now add in reinforcement learning and perhaps, as Knight suspects, other DeepMind specialties in robotics and neuroscience, and the next step in AI is starting to look a bit like a high-tech blanket.
But Gemini will not be the first multimodal algorithm. It also won’t be the first company to use reinforcement learning or support plugins. OpenAI has integrated all of this into GPT-4 to impressive effect.
If Gemini goes that far and no further, it might be GPT-4 compliant. It is interesting who is working on the algorithm. Earlier this year, DeepMind merged with Google Brain. The latter invented the first transformers in 2017; The former designed AlphaGo and its successors. Incorporating DeepMind’s reinforcement learning expertise into large language models can lead to new capabilities.
In addition, Gemini can peak in AI without a jump in size.
GPT-4 is believed to have around a trillion parameters, and recent rumors have suggested it could be an “expert mix model” consisting of eight smaller models, each a finely tuned Specialist roughly the size of GPT -3 has . Neither the size nor the architecture have been confirmed by OpenAI, which for the first time did not publish specifications for its latest model.
Similarly, DeepMind has shown interest in developing smaller models that are above their weight class (Chinchilla) and Google has experimented with expert blends (GLaM).
Gemini might be a little bigger or smaller than GPT-4, but probably not by much.
Still, we may never know exactly what Gemini is as increasingly competitive companies keep the details of their models secret. To this end, testing advanced models for their performance and controllability during construction is becoming increasingly important. This work is also crucial for safety, according to Hassabis. He also said Google could make models like Gemini available to outside researchers for evaluation.
“I would like to see science have early access to these frontier models,” he said.
Whether Gemini will match or surpass GPT-4 remains to be seen. The more complicated the architectures become, the less automatically there are profits. Still, it seems Altman had a merging of data and approaches—text with images and other inputs, large language models with reinforcement learning models, stitching smaller models together into a larger whole—in mind when he said we’re doing this would make the AI better in other ways than just sheer size.
When can we expect twins?
Hassabis did not provide any information on the exact timetable. If he meant the training would take “a few months,” it could be a while before Gemini hits the market. A trained model is no longer the endpoint. OpenAI spent months thoroughly testing and refining GPT-4 in the raw state before its final release. Google may be even more cautious.
But Google DeepMind is under pressure to deliver a product that raises the bar in AI, so it wouldn’t be surprising to see Gemini later this year or early next year. If that’s the case and Gemini lives up to its promises — both big question marks — Google could, at least for now, reclaim the OpenAI limelight.
Image Credit: Hossein Nasr/Unsplash