WHY THIS MATTERS IN BRIEF
As teachers increasingly worry about how students will use new powerful AI’s to cheat they’re finding out their concerns are justified, and worse.
On the back of ChatGPT, an Artificial Intelligence (AI) tool that’s taken the world by storm, acing GCSE history exams in the UK, creating malware, as well as doing all kinds of other things including helping my ten year old son write his first book, it turns out it’s also capable of passing — or at least nearly passing — the US medical licensing exams, according to researchers in the US who put it to the test.
It’s the latest achievement for the publicly available and currently free technology, which was released at the end of last year and has been the subject of non-stop coverage since.
ChatGPT is one of a class of “Generative AI” programs that can produce images, video and audio, make arguments, summarise books, tell jokes, write code, and generally be useful to people. But for educators, the technology opens the door to widespread cheating on homework and take-home assignments, and many have been scrambling to rethink the nature of assessment or otherwise discourage students using the tool.
This week, the New York school system banned the use of ChatGPT, while Australian universities said they were reinstating “pen and paper” exams and beefing up cheating detection measures.
Now, in a pre-print study that has not yet been peer reviewed, researchers have explored the upper limits of ChatGPT’s capabilities. They say the AI tool achieved over 50 per cent in one of the most difficult standardised tests around: the US medical licensing exam (USMLE).
Just weeks after the launch of ChatGPT in December last year, researchers at a California-based healthcare provider, Ansible Health, began experimenting with the tool in their day-to-day work. They found it could help with tasks such as drafting payment notices, simplifying jargon-dense radiology reports, and even to brainstorm answers for “diagnostically challenging cases”.
“Overall, our clinicians reported a 33 per cent decrease … in the time required to complete documentation and indirect patient care tasks,” the study authors wrote.
To test the program’s ability to perform clinical reasoning, they had it sit a mock, abbreviated version of the USMLE, which is required for any doctor to obtain a license to practice medicine in the US.
The USMLE consists of three exams, with the first generally taken by second-year medical students, the second by those in their fourth year, and the last by physicians after a year of postgraduate education.
For most applicants, the tests require more than a year of dedicated preparation time. The first two tests each take a day, and the last takes two days. The researchers fed questions from previous exams to ChatGPT and had the answers, ranging from open-ended written responses to multiple choice, independently scored by two physician adjudicators.
They also checked that the answers to those questions weren’t likely to be in the dataset accessible by the AI tool when it had been trained – in other words that ChatGPT hadn’t already seen the answers. The tool received more than 50 per cent across all examinations, and approached the USMLE pass threshold of about 60 per cent.
“Therefore, ChatGPT is now comfortably within the passing range,” the paper concludes.
Phillip Dawson, an academic integrity researcher at Deakin University, said he wasn’t able to evaluate the study itself, but that “if the authors really did what they say they’ve done, then that’s scary stuff. There’s a sense that this is going to be even bigger than the pandemic in terms of how it changes assessment.”
Kane Murdoch, the head of academic misconduct at Macquarie University, said he was “not surprised at all” that ChatGPT could pass the USMLE. “[And] those are pretty serious and complex exams — simpler assessments would be a piece of cake.”
He and others are pushing for universities to embrace ChatGPT, rather than banning it outright.
“[ChatGPT] is like the advent of the calculator — a game changer,” he said. “Telling students that using it is forbidden won’t stop usage.
“I expect it to be very heavily used until such times as universities develop new strategies for assessment,” he added.
The Tertiary Education Quality and Standards Agency (TEQSA), which regulates higher education in Australia, appears to agree that ChatGPT shouldn’t be banned.
“That’s not a practical or sustainable strategy,” said Helen Gniel, who runs TEQSA’s higher education integrity unit. “Machine learning is only going to improve. It’s going to become quite standard.”
The use of the tool is made all that more difficult thanks to the fact that it’s plausible academic writing is very hard for educators or existing academic integrity software to detect, although the style is bland and formulaic, and it has a habit of making up facts and references.
Kane Murdoch, whose job at Macquarie University includes detecting the use of AI text-generators, said most academics “don’t know what they’re looking for” and would fail to notice when a student has used ChatGPT. “What I’m looking for is really gross errors of fact,” he said.
And, as academics around the world find ChatGPT both amazing and horrifying at the same time, the only thing that’s certain is that this and other tools like it are going to get a lot better very fast indeed.