WHY THIS MATTERS IN BRIEF
Most AI Machine Vision models are still very heavy and cumbersome and need lots of training data, but newer lightweight Capsule models are leaner and faster.
Interested in the Exponential Future? Connect, download a free E-Book, watch a keynote, or browse my blog.
If you want to blame someone for the hoopla around Artificial Intelligence (AI) then 69 year old Google researcher Geoff Hinton is a good candidate after the droll University of Toronto professor jolted the field onto a new trajectory in October 2012. With two grad students, Hinton showed that an unfashionable technology he’d championed for decades called Artificial Neural Networks (ANN’s) permitted a huge leap in machines’ ability to understand images. And within six months, all three researchers were on Google’s payroll. Today neural networks are building their own computer games, designing the first generations of digital humans, and helping read people’s minds – and all that’s for starters. In fact, they’re becoming so ubiquitous that you’re probably not far, digitally at least, from one right now. But oddly Hinton now belittles the technology he helped bring to the world.
“I think the way we’re doing machine vision is just wrong,” he says. “[ANN’s] work better than anything else at present but that doesn’t mean it’s right.”
In its place, Hinton has unveiled another “old” idea that might transform how computers see and reshape AI. That’s important because machine vision is crucial to ideas such as self-driving cars, and having software that plays doctor, and a little while ago he released two research papers that he says prove out an idea he’s been mulling for almost 40 years.
Capsule Networks 101
“It’s made a lot of intuitive sense to me for a very long time, it just hasn’t worked well,” Hinton says. “We’ve finally got something that works well.”
Geoff Hinton’s keynote, and Capsule Network explainer
Hinton’s new approach, known as Capsule Networks (CapNet’s), is a twist on neural networks intended to make machines better able to understand the world through images or video. In one of the papers Hinton’s capsule networks matched the accuracy of the best previous techniques on a standard test of how well software can learn to recognize handwritten digits.
In the second, capsule networks almost halved the best previous error rate on a test that challenges software to recognize toys such as trucks and cars from different angles. Hinton has been working on his new technique with colleagues Sara Sabour and Nicholas Frosst at Google’s Toronto office.
Capsule networks aim to remedy a weakness of today’s machine learning systems that limits their effectiveness. Image-recognition software in use today by Google and others needs a large number of example photos to learn to reliably recognize objects in all kinds of situations. That’s because the software isn’t very good at generalizing what it learns to new scenarios, for example understanding that an object is the same when seen from a new viewpoint.
To teach a computer to recognize a cat from many angles, for example, could require thousands of photos covering a variety of perspectives. Human children, by comparison, don’t need such explicit and extensive training to learn to recognize a household pet.
Hinton’s idea for narrowing the gulf between the best AI systems and ordinary toddlers is to build a little more knowledge of the world into machine vision software. Capsules, that are small groups of crude virtual neurons, are designed to track different parts of an object, such as a cat’s nose and ears, and their relative positions in space. Ergo a network of many capsules can use that awareness to understand when a new scene is in fact a different view of something it has seen before. And that’s the breakthrough.
Hinton formed his intuition that machine vision systems “need such an inbuilt sense of geometry” in 1979, when he was trying to figure out how humans use mental imagery, and he first laid out a preliminary design for capsule networks in 2011.
“Everyone has been waiting for the next great leap from Geoff,” says Kyunghyun Cho, a professor at New York University who works on image recognition.
It’s too early to say how big a leap Hinton has made – and he knows it. The AI veteran segues from quietly celebrating that his intuition is now supported by evidence, to explaining that capsule networks still need to be proven on large image collections, and that the current implementation is slow compared to existing image-recognition software.
Hinton is optimistic he can address those shortcomings. Others in the field are also hopeful about his long-maturing idea.
Roland Memisevic, co-founder of image-recognition startup Twenty Billion Neurons, and a professor at University of Montreal, says Hinton’s basic design should be capable of extracting more understanding from a given amount of data than existing systems. If proven out at scale, that could be helpful in domains such as healthcare, where image data to train AI systems is much scarcer than the large volume of selfies available around the internet.
In some ways, capsule networks are a departure from a recent trend in AI research. One interpretation of the recent success of neural networks is that humans should encode as little knowledge as possible into AI software, and instead make them figure things out for themselves from scratch.
Gary Marcus, a professor of psychology at NYU who sold an AI startup to Uber, says Hinton’s thinking represents a welcome breath of fresh air and argues that AI researchers should be doing more to mimic how the brain uses its own built-in, innate machinery for learning crucial skills like vision and language.
“It’s too early to tell how far this particular architecture will go, but it’s great to see Hinton breaking out of the rut that the field has seemed fixated on,” Marcus says.