WHY THIS MATTERS IN BRIEF
Humans have an inane desire to measure and categorise things, and this includes being able to measure an AI’s level of intelligence, but it’s proving to be a difficult problem to solve.
Human intelligence is hard enough to measure, and over the decades today’s ubiquitous Intelligence Quotient, or “IQ,” test, the standard by which we are all judged, has caused ferocious controversy. But today we have a new, potentially even thornier dilemma – how we assess the intelligence of today’s and tomorrow’s increasingly capable Artificial Intelligence (AI) agents, and perhaps, one day, even the Avatars and robots that they’ll inhabit.
While there are those who argue we shouldn’t even bother trying to measure AI’s IQ, whether it’s because AI is seen as an “rapidly evolving alien, artificial and synthetic” form of intelligence by nature that is “dramatically different to human intelligence,” or because today there are already so many different variants and variations of AI it makes “one standard to rule them all” almost impossible to define, being able to measure things seems deeply engrained into human behaviour. Therefore, it’s obviously inevitable that at some point we will find ourselves adopting a new standard, an IQ test for AI that “once and for all” can tell us if we are in fact dumber than the 10,000 IQ chip in our trainers which is slated by Softbank CEO Masayoshi Son, who now owns ARM, to arrive by 2047.
Over the decades there have been a number of attempts by companies, such as Facebook who recently wrote a white paper on how to “Evaluate the intelligence of AI,” and individuals, such as Alan Turning with his Turing Test, to create an standards based test but in the main very few of them have been hailed as credible.
Now though, bearing in mind that it’s in our nature to want to be able to answer that fundamental question of “Is this AI smarter than a human?” a team, led by Feng Liu at the Chinese Academy of Sciences in China have developed an intelligence test that both machines and humans can take, and they’ve used it to rank intelligent assistants such as Google Assistant and Apple’s Siri on the same scale they’re using to assess humans.
Their test is based on what they call the “Standard Intelligence Model.” In this new model, systems must have a way of obtaining data from the outside world, they must be able to transform the data into a form that they can process, they must be able to use this knowledge in an innovative way, and, finally, they must feed the resultant knowledge back into the outside world.
Basically, the test boils down to being able to gather data, master it, exercise creativity over it, and then produce an output, which, at a high level sounds a sensible enough proposition.
“If a system has [these] characteristics, it can be defined as a standard intelligence system,” says Feng.
The team’s test measures a machine and a human’s ability to do all these things, and while, unfortunately they don’t go into the details of how it works they say that they’ve been testing humans and intelligent assistants from Apple, Baidu, Google, Microsoft and Sogou, since 2014, and their methodology has allowed them to rank them all on the same scale.
In their latest ranking, from 2016, this is the rank they produced:
- Human 18 years old 97
- Human 12 years old 84.5
- Human 6 years old 55.5
- Google 47.28
- Baidu’s Duer 37.2
- Baidu 32.92
- Sogou 32.25
- Bing 31.98
- Microsoft’s Xiaobing 24.48
- Apple’s Siri 23.94
As you can see on this scale even a 6 year old human outperforms the most advanced digital assistant, which in this case is Google’s, and as someone who uses these assistants day in and day out at first glance the ranking seems roughly right, although personally I’d say as soon as you add in an AI’s ability to hold a conversation, as you would with a six year old, the test falls apart. That said perhaps they’ll address that in future versions of the test.
However, as ever the devil is in the detail and unfortunately because the team haven’t released very many details of their experiment, other than a tentative Arvix paper, we have very little to go on other than their high level methodology.
It’s also worth remembering that AI is improving rapidly, in 2014, for example, Google’s assistant scored 26.4 in this test, and only two years later it scored 47.28, not far behind that 6 year old, and that’s a significant increase, so it’ll be interesting to see how these machines perform in the upcoming 2017 ranking.
While the new test is interesting, and testing AI’s against humans directly is an interesting, and potentially credible idea, unless Feng and his team share more details of how they run the test it’s inevitable that almost everyone in the scientific community will remain sitting on the fence and remain sceptical.