Just last week Rich Sutton published a very short blog post titled The Bitter Lesson. I’m going to try to keep this review shorter than his post. Sutton is well known for his long and sustained contributions to reinforcement learning.
In his post he argues, using many good examples, that over the 70 year history of AI, more computation and less built in knowledge has always won out as the best way to build Artificial Intelligence systems. This resonates with a current mode of thinking among many of the newer entrants to AI that it is better to design learning networks and put in massive amounts of computer power, than to try to design a structure for computation that is specialized in any way for the task. I must say, however, that at a two day work shop on Deep Learning last week at the National Academy of Science, the latter idea was much more in vogue, something of a backlash against exactly what Sutton is arguing.
I think Sutton is wrong for a number of reasons.
- One of the most celebrated successes of Deep Learning is image labeling, using CNNs, Convolutional Neural Networks, but the very essence of CNNs is that the front end of the network is designed by humans to manage translational invariance, the idea that objects can appear anywhere in the frame. To have a Deep Learning network also have to learn that seems pedantic to the extreme, and will drive up the computational costs of the learning by many orders of magnitude.
- There are other things in image labeling that suffer mightily because the current crop of CNNs do not have certain things built in that we know are important for human performance. E.g., color constancy. This is why the celebrated example of a traffic stop sign with some pieces of tape on it is seen as a 45 mph speed limit sign by a certain CNN trained for autonomous driving. No human makes that error because they know that stop signs are red, and speed limit signs are white. The CNN doesn’t know that, because the relationship between pixel color in the camera and the actual color of the object is a very complex relationship that does not get elucidated with the measly tens of millions of training images that the algorithms are trained on. Saying that in the future we will viable training sets is shifting the human workload to creating massive training sets and encoding what we want the system to learn in the labels. This is just as much building knowledge in as it would be to directly build a color constancy stage. It is sleight of hand in moving the human intellectual work to somewhere else.
- In fact for most machine learning problems today a human is needed to design a specific network architecture for the learning to proceed well. So again, rather than have the human build in specific knowledge we now expect the human to build the particular and appropriate network, and the particular training regime that will be used. Once again it is sleight of hand to say that AI succeeds without humans getting into the loop. Rather we are asking the humans to pour their intelligence into the algorithms in a different place and form.
- Massive data sets are not at all what humans need to learn things so something is missing. Today’s data sets can have billions of examples, where a human may only require a handful to learn the same thing. But worse, the amount of computation needed to train many of the networks we see today can only be furnished by very large companies with very large budgets, and so this push to make everything learnable is pushing the cost of AI outside that of individuals or even large university departments. That is not a sustainable model for getting further in intelligent systems. For some machine learning problems we are starting to see a significant carbon foot print due to the power consumed during the learning phase.
- Moore’s Law is slowing down, so that some computer architects are reporting the doubling time in amount of computation on a single chip is moving from one year to twenty years. Furthermore the breakdown of Dennard scaling back in 2006 means that the power consumption of machines goes up as they perform better, and so we can not afford to put even the results of machine learning (let alone the actual learning) on many of our small robots–self driving cars require about 2,500 Watts of power for computation–a human brain only requires 20 Watts. So Sutton’s argument just makes this worse, and makes the use of AI and ML impractical.
- Computer architects are now trying to compensate for these problems by building special purpose chips for runtime use of trained networks. But they need to lock in the hardware to a particular network structure and capitalize on human analysis of what tricks can be played without changing the results of the computation, but with greatly reduced power budgets. This has two drawbacks. First it locks down hardware specific to particular solutions, so every time we have a new ML problem we will need to design new hardware. And second, it once again is simply shifting where human intelligence needs to be applied to make ML practical, not eliminating the need for humans to be involved in the design at all.
So my take on Rich Sutton’s piece is that the lesson we should learn from the last seventy years of AI research is not at all that we should just use more computation and that always wins. Rather I think a better lesson to be learned is that we have to take into account the total cost of any solution, and that so far they have all required substantial amounts of human ingenuity. Saying that a particular solution style minimizes a particular sort of human ingenuity that is needed while not taking into account all the other places that it forces human ingenuity (and carbon footprint) to be expended is a terribly myopic view of the world.
This review, including this comment, seventy eight words shorter than Sutton’s post.