WHY THIS MATTERS IN BRIEF
Many people think OpenAI’s products are a single AI, but they’re many AI’s linked together to form what’s known as a “Master of Experts” model, and this could form the foundation of future AGI.
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.
In March, OpenAI launched GPT-4 with much fanfare, but a dark cloud loomed over the horizon. Scientists and Artificial Intelligence (AI) enthusiasts alike panned the company for not releasing any specifics about the model, like the parameter size or architecture. However, now months later a top AI researcher has speculated the inner workings of GPT-4 revealing why OpenAI chose to hide this information — and it’s disappointing.
OpenAI CEO Sam Altman famously stated on GPT-4 that “people are begging to be disappointed, and they will be,” speaking about the potential size of the model. Rumour mills ahead of the model’s launch suggested that it would have trillions of parameters and be the best thing that the world has ever seen. However, the reality is different. In the process of making GPT-4 better than GPT-3.5, OpenAI might have bitten off more than it could possibly chew.
George Hotz, world-renowned hacker and software engineer, recently appeared on a podcast to speculate about the architectural nature of GPT-4. Hotz stated that the model might be a set of eight distinct models, each featuring 220 billion parameters. This speculation was later confirmed by Soumith Chintala, the co-founder of PyTorch.
While this puts the parameter count of GPT-4 at 1.76 trillion, which makes it one of the largest AI models out there, the notable part is that all of these models don’t work at the same time. Instead, they are deployed in a so called Model of Experts (MoE) architecture. This architecture makes each model into different components, also known as expert models. Each of these models is fine-tuned for a specific purpose or domain, and is able to provide better responses for that field. Then, all of the expert models work together with the complete model drawing on the collective intelligence of the expert models.
This approach has many benefits. One is that of more accurate responses due to models being fine-tuned on various subject matters. MoE architecture also lends itself to being easily updated as the maintainers of the model can improve it in a modular fashion, as opposed to updating a monolithic model. Hotz also speculated that the model may be relying on the process of iterative inference for better outputs. Through this process, the output, or inference result of the model, is refined through multiple iterations.
This method also might allow GPT-4 to get inputs from each of its expert models, which could reduce the hallucinations in the model. Hotz stated that this process might be done 16 times, which would vastly increase the operating cost of the model. This approach has been likened to the old trope of three children in a trenchcoat masquerading as an adult. Many have likened GPT-4 to be 8 GPT-3s in a trench coat, trying to pull the wool over the world’s eyes.
While GPT-4 aced benchmarks that GPT-3 has had difficulties with, the MoE architecture seems to have become a pain point for OpenAI. In a now-deleted interview, Altman admitted to the scaling issues OpenAI is facing, especially in terms of GPU shortages.
Running inference 16 times on a model with MoE architecture is sure to increase cloud costs on a similar scale. When blown up to ChatGPT’s millions of users, it’s no surprise that even Azure’s supercomputer fell short of power. This seems to be one of the biggest problems that OpenAI is facing currently, with Altman stating that cheaper and faster GPT-4 is the company’s top priority as of now.
This has also resulted in a reported degradation of quality in ChatGPT’s output. All over the Internet, users have reported that the quality of even ChatGPT Plus’ responses have gone down.
I found a release note for ChatGPT that seems to confirm this, which stated, “We’ve updated performance of the ChatGPT model on our free plan in order to serve more users.” In the same note, OpenAI also informed users that Plus users would be defaulted to the “Turbo” variant of the model, which has been optimised for inference speed.
API users, on the other hand, seem to have avoided this problem altogether. Reddit users have noticed that other products which use the OpenAI API provide better answers to their queries than even ChatGPT Plus. This might be because users of the OpenAI API are lower in volume when compared to ChatGPT users, resulting in OpenAI cutting costs at ChatGPT while ignoring the API.
In a mad rush to get GPT-4 out to the market, it seems that OpenAI has cut corners, and just how many we’ll likely never know, but while the purported MoE model is a good step forward for making the GPT series more performant, the scaling issues that it is facing show that the company might just have bitten off more than it can chew.
Looking forwards though towards GPT-5 this approach might yield some interesting and problematic results for the company as they begin to connect these independent “expert” AI sub-models together to create what many would regard as the first plausible prototype Artificial General Intelligence (AGI), which has been their stated goal all along, to give GPT-5 cross domain knowledge – something which it’s lacking at the moment.