WHY THIS MATTERS IN BRIEF
AI is coming to augment doctors today, but eventually it could replace them but it’ll be a long time before we see a pure play AI healthcare system.
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, read our codexes, or browse my blog.
After taking and passing the US’s hardest medical exam, the USMLE, a while ago researchers are pushing ahead with trying to figure out what other healthcare challenges GPT4 can help with. Dr Isaac Kohane, who is both a computer scientist at Harvard University and a doctor, teamed up with two colleagues to test GPT4 with one main goal – to see how OpenAI’s latest Artificial Intelligence (AI) model performs in a medical setting.
“I’m stunned to say it’s better than many doctors I’ve observed,” he says in the forthcoming book, “The AI Revolution in Medicine,” co-authored by independent journalist Carey Goldberg and Microsoft VP of Research Peter Lee.
In the book, Kohane says that GPT4, which was released to paying subscribers in March 2023, correctly answers licensing questions for US medical exams more than 90% of the time. It is a much better test taker than previous ChatGPT AI models, GPT3 and 3.5, and also better than some licensed doctors.
The Future of Work and Longevity, by keynote Matthew Griffin
However, GPT4 is not only a good test taker and fact finder. It’s also a great translator. In the book it is able to translate discharge information for a patient who speaks Portuguese and distil warped technical jargon into something sixth graders can easily read.
As the authors explain with vivid examples, GPT4 can also provide physicians with helpful suggestions about bedside behaviour, tips on how to talk to patients about their conditions in compassionate, clear language, and it can read and summarize lengthy reports or studies the blink of an eye. The technique can even explain its reasoning through problems in a way that requires a certain level of human-style intelligence.
But if you ask GPT4 how it does all of this, it will likely tell you that all of its intelligence is still “limited to patterns in the data and does not involve real understanding or intent.” That’s what GPT4 told the book’s authors when they asked it if it could actually make causal reasoning. Even with such limitations, as Kohane discovered in the book, GPT4 can mimic how doctors diagnose disease with amazing – if imperfect – success.
Kohane also conducted a clinical thought experiment with GPT4 in the book based on a real case involving a new born he had treated several years earlier. He gave the bot some key details about the baby that he’d gathered from a physical exam, along with some information from an ultrasound and hormone levels, and the bot was able to diagnose a 1 in 100,000 condition called Congenital Adrenal Hyperplasia “exactly like I would , with all my years of study and experience,” Kohane wrote.
The doctor was both impressed and appalled.
“On the one hand, I was conducting a sophisticated medical conversation with a computational process,” he wrote, “on the other hand, the anxious realization that millions of families would soon have access to this impressive medical expertise was just as overwhelming. and I couldn’t see how we could guarantee or certify that GPT4’s advice is safe or effective.”
GPT4 is not always reliable, and the book is full of examples of its failures. They range from simple typographical errors such as incorrectly entering a BMI that the bot had just calculated correctly, to calculation errors such as inaccurately “solving” a Sudoku puzzle or forgetting to square a term in an equation. The mistakes are often subtle, and the system tends to take it for granted even when challenged. It’s not hard to imagine how a misplaced number or miscalculated weight could lead to serious prescribing or diagnostic errors.
Like previous GPTs, GPT4 can “hallucinate” – the technical euphemism for when AI invents answers or disregards requests.
When asked by the book’s authors on the subject, GPT4 said: “I do not intend to deceive or mislead anyone, but I sometimes make mistakes or assumptions based on incomplete or inaccurate data. Nor do I have the clinical judgment or ethical responsibility of a human doctor or nurse.”
One possible cross-check the authors suggest in the book is to start a new session with GPT4 and have it “read over” and “verify” its own work with a “new set of eyes.” This tactic sometimes works to uncover bugs – although GPT4 is a bit reluctant to admit when it was wrong. Another suggestion for catching bugs is to command the bot to show you its work so you can review it human-style.
It is clear that GPT-4 has the potential to free up valuable time and resources in the clinic, allowing physicians to be more present with patients “instead of their computer screens,” the authors write. But they say, “We must force ourselves to envision a world of ever more intelligent machines, eventually perhaps surpassing human intelligence in almost every dimension.” And then think very carefully about how this world should work.”