Blog

[FoR&AI] Machine Learning Explained

rodneybrooks.com/forai-machine-learning-explained/

[An essay in my series on the Future of Robotics and Artificial Intelligence.]

Much of the recent enthusiasm about Artificial Intelligence is based on the spectacular recent successes of machine learning, itself often capitalized as Machine Learning, and often referred to as ML. It has become common in the technology world that the presence of ML in a company, in a development process, or in a product is viewed as a certification of technical superiority, something that will outstrip all competition.

Machine Learning is what has enabled the new assistants in our houses such as the Amazon Echo (Alexa) and Google Home by allowing them to reliably understand as we speak to them. Machine Learning is how Google chooses what advertisements to place, how it saves enormous amounts of electricity at its data centers, and how it labels images so that we can search for them with key words. Machine learning is how DeepMind (a Google company) was able to build a program called Alpha Go which beat the world Go champion. Machine Learning is how Amazon knows what recommendations to make to you whenever you are at its web site. Machine Learning is how PayPal detects fraudulent transactions. Machine Learning is how Facebook is able to translate between languages. And the list goes on!

While ML has started to have an impact on many aspects of our life, and will more and more so over the coming decades, some sobriety is not out of place. Machine Learning⁠1 is not magic. Neither AI programs, nor robots, wander around in the world ready to learn about whatever there is around them.

Every successful application of ML is hard won by researchers or engineers carefully analyzing the problem that is at hand. They select one or many different ML algorithms, and custom design how to connect them together and to the data. In some cases there is an extensive period of training on very large sets of data before the algorithm can be run on the problem that is being solved. In that case there may be months of work to do in collecting the right sort of data from which ML will actually learn. In other cases the learning algorithm will be integrated in to the application and will learn while doing the task that is desired–it might require some training wheels in the early stages, and they too must be designed. In any case there is always a big design project about how, when the ultimate system is operational, the data that comes in will be organized, processed and mapped before it reaches the ML component of the system.

When we are tending plants we pour water on them and perhaps give them some fertilizer and they grow. I think many people in the press, in management, and in the non-technical world have been dazzled by the success of Machine Learning, and have come to think of it a little like water or fertilizer for hard problems. They often mistakenly believe that a generic version will work on any and all problems. But while ML can sometimes have miraculous results it needs to be carefully customized after the DNA of the problem has beed analyzed.  And even then it might not be what is needed–to extend the metaphor, perhaps it is the climate that needs to be adjusted and no amount of fertilizer or ML will do the job.

How does Machine Learning work, and is it the same as when a child or adult learns something new? The examples above certainly seem to cover some of the same sort of territory, learning how to understand a human speaking, learning how to play a game, learning to name objects based on their appearance.

Machine Learning started with games

In the early 1940’s as war was being waged world wide there were only a handful of electronic digital computers in existence. They had been built, using the technology of vacuum tubes, to calculate gunnery tables and to decrypt coded military communications of the enemy.  Even then, however, people were starting to think about how these computers might be used to carry out intelligent activities, fifteen years before the term Artificial Intelligence was first floated by John McCarthy.

Alan Turing, who in 1936 had written the seminal paper that established the foundations of modern computation, and Donald Michie, a classics student from Oxford (later he would earn a doctorate in genetics), worked together at Bletchley Park, the famous UK code breaking establishment that Churchill credited with subtracting years from the war. Turing contributed to the design of the Colossus computer there, and through a key programming breakthrough that Michie made, the design of the second version of the Colossus was changed to accommodate his ideas even better. Meanwhile at the local pub the pair had a weekly chess game together and discussed how to program a computer to play chess, but they were only able to get as far as simulations with pen and paper.

In the United States right after the war, Arthur Samuel3, an expert on vacuum tubes was the leader of an effort to built the ILLIAC computer at the University of Illinois at Urbana-Champaign. While the computer was still being built he planned out how to program it to play checkers (or draughts in British English), but left in 1949 to join IBM before the University computer was completed. At IBM he worked on both vacuum tubes and transistors to bring IBM’s first commercial general purpose digital computers to market. On the side he was able to implement a program that by 1952 could play checkers against a human opponent. This was one of the first non-arithmetical programs to run on general purpose digital computers, and has been called the first AI program to run in the United States.

Samuel continued to improve the program over time and in 1956 it was first demonstrated to the public. But Samuel wondered whether the improvements he was making to the program by hand could be made by the machine itself. In 1959 he published a paper titled “Some Studies in Machine Learning Using the Game of Checkers”⁠2, the first time the phrase “Machine Learning” was used–earlier there had been models of learning machines, but this was a more general concept.

The first sentence in his paper was: “The studies reported here have been concerned with programming of a digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning.” Right there is his justification for using the term learning, and while I would not quibble with it, I think that it may have had some unintended consequences which we will explore towards the end of this post.

What Samuel had realized, demonstrated, and exploited, was that digital computers were by 1959 fast enough to take over some of the fine tuning that a person might do for a program, as he had been doing since the first version of his program in 1952, and ultimately eliminate the need for much of that effort by human programmers by letting the computer do some Machine Learning on appropriate parts of the problem. This is exactly what has lead, almost 60 years later to the great influence that ML is now having on the world.

One of the two learning techniques Samuel described was something he called rote learning, and today would be labelled as a well known programming technique called memoization4, and sped up the program. The other learning technique that he investigated involved adjusting numerical weights on how much the program should believe each of over thirty measures of how good or bad a particular board position was for the program or its human opponent. This is closer in spirit to techniques in modern ML. By improving this measure the program could get better and better at playing. By 1961 his program had beat the Connecticut state checker champion. Another first for AI, and enabled by the first ML program.

Arthur Samuel built his AI and ML systems not as an academic researcher but as a scholar working on his own time apart from his day job. However he had an incredible advantage over all the AI academic researchers. Whereas access to computers was rare and precious for them, Samuel’s day job was as a key participant building the first mass produced digital computers, and each one needed to be run for many hours to catch early life defects before it could be shipped. He had a surfeit of free computer time. Just about no one else in the world had such a luxurious computational environment.

Sometimes the less lucky academics had to resort to desperate measures. And so it was for Donald Michie, colleague of Alan Turing back at Bletchley Park. By 1960 he was a Senior Lecturer in Surgical Science at the University of Edinburgh, but his real interests lay in Artificial Intelligence, though he always preferred the term Machine Intelligence.

In 1960 Surgical Science did not have much pull in getting access to a digital computer. So Donald Michie himself built a machine that could learn to play the game of tic-tac-toe (Noughts and Crosses in British English) from 304 matchboxes, small rectangular boxes which were the containers for matches, and which had an outer cover and a sliding inner box to hold the matches. He put a label on one end of each of these sliding boxes, and carefully filled them with precise numbers of colored beads. With the help of a human operator, mindlessly following some simple rules, he had a machine that could not only play tic-tac-toe but could learn to get better at it.

He called his machine MENACE, for Matchbox Educable Noughts And Crosses Engine, and published⁠5 a report on it in 1961. In 1962 Martin Gardner⁠6 reported on it in his regular Mathematical Games column in Scientific American, but illustrated it with a slightly simpler version to play hexapawn, three chess pawns against three chess pawns on a three by three chessboard. This was a way to explain Machine Learning and provide an experimental vehicle to the scientifically interested lay population, who certainly would not have had access to a digital computer at that time. Gardner suggested that people try building a matchbox computer to play simplified checkers with two pieces for each player on a four by four board. But he felt that even the simplest version of chess that he could come up with, on a five by five board would require too many matchboxes to be practical.

I first read about the matchbox computer in 1967 in a book⁠7 published the previous year which was written by a group of teachers at a British high school. They neither attributed the idea to Michie, nor the game they described it learning, hexapawn, to Gardner. As a barely teenager who had to hand build every machine for every experiment I wanted to do in AI, I must admit I thought that the matchbox computer was too simple a project and so did not pursue it. Now, however, I have come to realize that it is the perfect way of introducing how Machine Learning works, as everything is there to see. Even though MENACE is over fifty years old many of the problems that it faced are still relevant to machine learning algorithms today, and it shares many characteristics with almost all of today’s machine learning. Due to its simplicity it can be described in complete detail and no mathematics is needed to get a strong intuitive understanding of how it works.

Today people generally recognize three different classes of Machine Learning, supervised, unsupervised, and reinforcement learning, all three very actively researched, and all being used for real applications.  Donald Michie’s MENACE introduced the idea of ML reinforcement learning, and he explicitly refers to reinforcement as a key concept in how it works.

How a collection of matchboxes plays & learns

I am going to take the details I give here from a retelling of how MENACE worked from a more accessible 1963 paper⁠8. In a very often republished picture from that paper Donald Michie (or at least his hands) can be seen both playing tic-tac-toe against the machine, and operating the machine.

On the sheet of paper in from of him you can just see a large tic-tac-toe diagram. There are stacks of matchboxes toward the rear of the table, must likely glued together so that they stay in place. Some of the matchboxes have their drawers partially pulled out, and he is holding one of the drawers in his left hand. As we will see this image captures most of what is going on as MENACE learns to play better and better tic-tac-toe.

To make the description more clear I am going to introduce a second person; let’s call him Alan. Alan will operate the matchbox machine according to fixed rules, and will not have to make any decisions that are not determined completely by those rules. Donald, the human player will not touch the machine, but instead will write out the moves on a standard three by three grid, accepting the moves of MENACE as delivered to him by Alan, playing his own moves, and being the adjudicator of when he or MENACE has won by getting three in a row.

Michie had MENACE always play first, with a ‘O’ and so we will do that here also. Below is what a game might look like, starting with an empty board, MENACE playing first in the middle of the top row, Donald replying, with an ‘X’ in the top right corner as his first play to be made, and back and forth. Notice that I am using a period, or ‘.’, for a blank, so that I don’t have to draw the customary horizontal and vertical lines to divide the squares. I have put a little indicator under each board position where it is MENACE’s turn to play. At MENACE’s third move it blocks an immediate win where Donald would be able to complete the diagonal from the upper right to the lower left, but Donald replies with a move to the bottom right corner, threatening now two possible three in a rows on his next turn, and MENACE is able to block only one of them so MENACE loses to Donald.

...   .O.   .OX   .OX   .OX   .OX   .OX   OOX   OOX   
...   ...   ...   ...   .X.   .X.   .X.   .X.   .XX   
...   ...   ...   .O.   .O.   OO.   OOX   OOX   OOX 
 ^           ^           ^           ^

The way that MENACE plays is that there is a matchbox for every possible board configuration that could arise in the course of the game when it is MENACE’s turn. There is a box for every configuration where it is MENACE’s first, second, third, or fourth turn, but not for its fifth turn as it has no choice to make there as there will only be one empty square.

The configurations are drawn on a small label pasted to the front of each individual drawer. When it is MENACE’s turn, Alan finds the matchbox with a label that matches the current state of play on the piece of paper on which Donald is keeping track of the game. He opens the drawer which has some number of colored beads in it. Without looking he randomly picks one of the beads. Importantly he leaves the drawer open and after showing the bead to Donald he puts it on the table in front of the open drawer from which it came. The boxes are arranged left to right corresponding to less moves played and then more moves played so it is easy to keep tack of which bead came from which drawer. There are nine colors of bead, and each color corresponds to one of the nine squares in tic-tac-toe. After seeing the bead Donald writes down an ‘O’ at the appropriate square on his piece of paper, and then writes his own ‘X’ as his next move, and the cycle repeats.

Although the actual colors of the beads do not really matter, here are the colors and their correspondence to the squares that were used in the original experiments (this time I have put in the horizontal vertical lines to divide the color words).

 white | lilac | silver
-----------------------
 black | gold  | green 
-----------------------
 amber |  red  | pink  

How many beads are there in each box?  For all the first move boxes, of which there is one, corresponding to the empty board, there are four beads for each possible move, so 36 in total.  For all possible second moves for MENACE there are only seven possibilities, and each of those empty squares has three beads.  For the third move there are 2 beads for each of the five possibilities, and for each fourth move there is one beed for each of the three possibilities.

To start out MENACE is playing each move completely at random. But suppose MENACE loses a game. Then Alan discards the beads below each open drawer and closes them all. This is negative reinforcement as MENACE lost, and so made moves it should not make in the future. In particular its fourth move, with only one bead, led to a loss that was at that point completely out of its control. So removing that bead means that MENACE will never play that bad move again. Its third move was perhaps a little suspect so that goes down to only one bead instead of two and it is less likely to try that again, but if it does it will not be tricked in exactly the same way again.

If MENACE draws the game then it gets positive reinforcement as each bead that was picked from each drawer is put back in, along with an extra bonus bead of the same color. If it won the game then it gets three additional beads same colored beads along with the one played at each turn. In this way MENACE tends to do the things that worked in the past, but if the opponent (in this case Donald) finds a new way to win against what used to work then MENACE will gradually adapt to that and avoid that losing line of play.

That is it. With Alan following this simple set of rules MENACE learns to play better and better over time. But there is one point of practicality.

Human structuring of the learning problem

As I described it above there are a lot of matchboxes needed for MENACE. There is 1 for MENACE’s first move, 72 for its second move, 756 for its third, and 1372 for its fourth move, for a total of 2201 matchboxes.

But let’s look at another possible game, where every position is different from the previous game we looked at.

...   ...   ...   ...   ...   O..   O..   O.O   O.O   
...   ..O   ..O   O.O   OXO   OXO   OXO   OXO   OXO   
...   ...   ..X   ..X   ..X   ..X   X.X   X.X   XXX
 ^           ^           ^           ^

But wait, this is really the same game as before just rotated ninety degrees clockwise. It is going to take a lot longer to learn how to play tic-tac-toe if MENACE has to independently learn that its second move in this game is just as bad as its second move in the previous game. To make MENACE learn faster, and to reduce the number of matchboxes down to a more manageable level, Donald Michie took into account that up to eight different patterns of Noughts and Crosses might really be essentially the same.

Here is an example where an original board positions is rotated clockwise by a quarter, half, and three-quarter turn, and where a reflection about the vertical axis, the horizontal axis, and the two diagonal axes all give different board positions Nonetheless these eight positions are essentially the same as far as the rules of the game of tic-tac-toe are concerned.

OX.   ..O   ...   ...   .XO   ...   O..   ...
.O.   .OX   .O.   XO.   .O.   .O.   XO.   .OX
...   ...   .XO   O..   ...   OX.   ...   ..O

Some board positions may not result in so many different looking positions when rotated or reflected. For instance, a single play in the center of the board is not changed at all by these spatial transformations.

In any case, by considering all the rotations and reflections of a board position as a single position, and therefore only assigning one matchbox to all of them combined, the requirements for MENACE are reduced to 1 matchbox, as before, for MENACE’s first move, 12 for its second, 108 for its third, and 183 for its fourth move, bringing the total⁠9 to 304. Furthermore by looking at the symmetries in what move is played there are often less essentially different moves than there are empty squares. For instance in both these cases MENACE is about to play an ‘O’:

...   .O.
...   .O.
...   X.X

In each case there are only three essentially different moves that can be played, so MENACE’s matchboxes for these move need only start out with three different colored beads rather than nine or five respectively.

By taking into account these symmetries the MENACE machine can be much smaller, and the speed of learning will be much faster, as lessons from one symmetric position will be automatically learned at another. The cost is that Alan is going to have to do quite a bit more work to operate MENACE. Now he will have to look at the position that Donald shows on the piece of paper where Donald is playing and not just look for an identical label on the front of a matchbox, but look for one that might be a rotation or reflection of the state of the game. Then, when Alan randomly selects a bead which indicates a particular move on the label on the matchbox from which it game, he will have to figure out which square that corresponds to on Donald’s sheet of paper through the inverse rotation or reflection that he used to select the matchbox. Fortunately this extra work is all quite deterministic and Alan is still following a strict set of rules with no room for judgement to creep in. We will come back to this a little later and mechanize Alan’s tasks though a few sheets of very simple instructions that will do all this extra work.

How well do matchboxes learn?

MENACE is learning what move it should choose in one of 304 essentially different board configurations for its first four moves in a game of tic-tac-toe. Since Alan randomly picks out one bead from the matchbox corresponding to one of those configurations it is making a random move from a small number of moves but the probability of a particular move goes up when there are more beads of a particular color from positive reinforcements from previous games, and the probability of a move which leads to a loss goes down relative to the other possible moves as its beads are removed.

Look back at the  two examples just above for a first move and a third move for MENACE. The empty board starts out with 12 beads, for each of three different colors representing placing an ‘O’ on a corner, in the middle of an edge, or in the middle of the board. The board waiting for MENACE’s third ‘O’ to be played starts out with just six beads, of three different colors, corresponding to playing the blocking move between the ‘X’s, one of the corners, or one of the other two middle edges. We will refer to number of beads of the same color in a single box as a parameter. By mapping all symmetric situations to a common matchbox and restricting the different moves to essentially different moves, there are 1087 parameters that MENACE adjusts over time through the removal or addition of beads. When MENACE starts off it has a total of 1720 beads representing those 1087 parameters in 304 different matchboxes.

When MENACE starts out it is playing uniformly randomly over all essentially different moves. If two uniformly random players play against each other, the first to play wins 59% of the time, draws 13%, and loses 28%. This shows the inherent bias in the game for the first player, which makes learning a little easier for MENACE.

In his original paper Michie reported that MENACE became quite a good player after only 220 games and was winning most games, but neither I nor others who report simulating MENACE (you can find many with a web search) saw MENACE doing that well at all. In fact since a perfect player never loses at tic-tac-toe then two perfect players always draw the game 100% of the time. It seems likely that Michie was carefully training MENACE with deliberately chosen games, and then playing against it in a fairly random way. He alludes to this when he later converted to a computer simulation of MENACE and mentions that playing against random moves results in much slower learning than playing against a deliberate policy.

To explore this I made a computer simulation of MENACE and three different simulated strategies of Donald playing against MENACE. I let learning proceed for 4,000 games, and did this multiple times against each of the three simulated players. Since there is randomness in picking a bead from a matchbox, the random number generator used by the computer to simulate this ensures that different trials of 4,000 games will lead to different actual games being played. Every so often I turned off the learning and tested how well⁠10 MENACE was currently playing against the three simulated players, including the two it was currently not learning from.

The three simulated players were as follows. Player A played completely randomly at all times. Player B played optimally and was unbeatable. Player C was optimal except that 25% of its moves were random instead of the optimal play. These are the three different versions of Donald that I used in my simulations.

In the table below the first row shows the performance of MENACE before it has learned at all, against each of the three simulated players. Each triple of numbers is the percentage of wins, draws, and losses (these are rounded percentages from a very large number of test games so don’t necessarily add to exactly 100%). As expected it never wins against Player B which plays optimally and can not be beaten.  Player C which makes mistakes 25% of the time can be beaten, but only about a quarter of the time.  In each row below that, MENACE was trained from scratch playing 4,000 games against a different one of these players. In each column we show, with MENACE stopped from further learning and adjusting its parameters, how it typically did against each of the three players once trained. We say “typically” as there is some variation in the resulting percentages between different trials with the same conditions, but only by a few points, and not in all cases.

\   compete |             |             |             |
 \  against | Player A    | Player B    | Player C    |
  \------\  |             |             |             |
trained   \ |             |             |             |
against    \|             |             |             |
------------|-------------|-------------|-------------|
no training |  59/ 13/ 28 |   0/ 24/ 76 |  27/ 19/ 53 |
============|=============|=============|=============|
Player A    |  86/  8/  6 |   0/ 28/ 72 |  50/ 20/ 30 |
------------|-------------|-------------|-------------|
Player B    |  71/ 15/ 14 |   0/100/  0 |  38/ 48/ 14 |
------------|-------------|-------------|-------------|
Player C    |  90/  8/  2 |   0/ 99/  1 |  56/ 42/  2 |
------------|-------------|-------------|-------------|

The first thing to notice is that how MENACE plays really does depend on what player it was trained against. When it is trained against Player B, which always plays optimally, it very quickly, usually after only about 200 games, learns to always play to a draw. But with that training (look in the same row) it is really not very good at playing against Player C which plays optimally with a 25% error rate. That is probably because in its training it never got to win at all against Player B, so it has not learned any winning moves to use against Player C.

When MENACE is trained against Player A (look in the row labelled Player A), which plays completely randomly it does learn to play against it quite well, and it also does reasonably well against Player C, probably because it has accidentally won enough times during training to have boosted some winning moves when they are available. It does dismally against the optimal Player B however. This particular box in the table has the highest variance of all in the table. Sometimes after 4,000 games it is doing less than half as well against Player B than when it started out learning.

When MENACE trains against Player C it does the best overall. It sees enough losses early on that after about 400 games it is starting to get good at avoiding losses, though it is still slowly, slowly getting better at that aspect of its game even after 4,000 games of learning. It usually doesn’t get quite as good as Player B, and very occasionally still loses to it, but it is really good at winning when there are opportunities for it to do so. We can see that against Player C it as learned to take advantage of its mistakes to drive home a win.

While not as good as a person, MENACE does get better against different types of players. It does however end up tuning its game to the type of player it is playing against.

There is also something surprising about the number of beads. MENACE starts off with 1,720 beads, but depending on which of Players A, B, or C, it is learning from it has from 2,300 to 3,200 beads after just 200 games, and always there is at least one parameter with over one hundred beads representing it by that time.  By 4,000 games it may have more than 35,000 beads representing just 1,087 parameters, with as many as 6,500 beads for one of the parameters.  This seems unnecessary, and perhaps the impact of rewarding all the moves with three beads on a win. However when I changed my simulation to never add more beads to a parameter that already had at least one hundred beads, a practical limit perhaps for a MENACE machine built from physical matchboxes, it tended to slow down learning in most cases represented in the table above, and even had small drops in typical levels of play even after 4,000 games of experience when playing against Players A and C.

Note that besides eliminating ever taking the very last bead away from a matchbox after a loss, this is the only place where I deviated from Michie’s description of his MENACE. Since he chose his plays carefully to instruct MENACE, and since he only played 220 games by hand, he perhaps did not come across the phenomenon of large numbers of beads.

Mechanizing MENACE a little more

In preparation for comparing how MENACE learns to how a person learns I want to make the role of Alan, the human operator of MENACE, a little clearer. In the description derived from Michie’s original paper, Michie himself played the role of both Donald and Alan. In my description above I talked about Alan matching the image of the paper on which Donald was playing to the labels on the matchboxes, possibly having to rotate or reflect the game board. And after randomly selecting a bead from that box, Alan would need to figure out which square that applied to on Donald’s piece of paper.

That sounds a little fuzzy, and perhaps requiring some reasoning on Alan’s part, so now we’ll make explicit a very rule driven approach that we could enforce, to ensure that Alan’s role is completely rote, with no judgement at all required.

We will make the communication between Donald and Alan very simple. Donald will hand Alan a string of nine characters drawn from ‘.’, ‘O’, and ‘X’, representing the board position after his play, and Alan will hand back a string where one of the periods has been replaced by an ‘O’. To enable this we will number the nine positions on the tic-tac-toe board as follows.

123
456
789

The string representing the board is just the contents of these squares in numerical order. So, for instance, if Donald has just played his ‘X’ to make the following board position, then he should give Alan the string printed to the right.

...
.OX       ....OX...
...

We will get rid of the labels, images of tic-tac-toe board positions from the front of the match boxes, and replace them with the numbers 1 through 304, so that each matchbox has a unique numerical label.  We will label the matchbox corresponding to the empty board with 1, as that will be how Alan starts a game, by drawing a bead from there, and he will look up what square that color means in a “Transform #1” on a sheet of paper with eight different transforms listed. Here they are:

Transform #1:
  white =1  lilac =2  silver=3  
  black =4  gold  =5  green =6  
  amber =7  red   =8  pink  =9  

Transform #2:
  white =3  lilac =6  silver=9  
  black =2  gold  =5  green =8  
  amber =1  red   =4  pink  =7  

Transform #3:
  white =9  lilac =8  silver=7  
  black =6  gold  =5  green =4  
  amber =3  red   =2  pink  =1  

Transform #4:
  white =7  lilac =4  silver=1  
  black =8  gold  =5  green =2  
  amber =9  red   =6  pink  =3  

Transform #5:
  white =3  lilac =2  silver=1  
  black =6  gold  =5  green =4  
  amber =9  red   =8  pink  =7  

Transform #6:
  white =7  lilac =8  silver=9  
  black =4  gold  =5  green =6  
  amber =1  red   =2  pink  =3  

Transform #7:
  white =1  lilac =4  silver=7  
  black =2  gold  =5  green =8  
  amber =3  red   =6  pink  =9  

Transform #8:
  white =9  lilac =6  silver=3  
  black =8  gold  =5  green =2  
  amber =7  red   =4  pink  =1  

The eight transforms correspond to four rotations (of zero, one, two and three quarters clockwise), and four reflections.

The remaining 303 matchboxes correspond to the essentially different board positions for MENACE’s second, third, and fourth moves. Although there are 72 different board positions for MENACEs second move there are only twelve that essentially distinct, and here they all are, numbered 2 through 13 as the next twelve matchboxes after the one for the first move.

#1    #2    #3    #4    #5    #6    #7    #8    #9    #10   #11   #12   #13
 |     |     |     |     |     |     |     |     |     |     |     |     |
...   .O.   .O.   .O.   .O.   X..   .X.   OX.   O.X   O..   O..   O..   XO.   
...   X..   .X.   ...   ...   .O.   .O.   ...   ...   .X.   ..X   ...   ...   
...   ...   ...   X..   .X.   ...   ...   ...   ...   ...   ...   ..X   ...   

When Alan is given a string by Donald (there is only one possible string for the first move, the empty board, but there are 72 possibilities for the MENACE’s second move, 756 for the third, and 1372 for the fourth move) Alan just mindlessly looks it up in a big table that is printed on a few sheets of paper. Each line has a string representing a board position, a box number, and a transform number. For instance, for the second move for MENACE we talked about above with string ....OX... Alan would find it, simply by matching character for character, in the following part of the table (for the first and second moves by MENACE):

.........  Box: #  1, Transform #1

.......OX  Box: # 13, Transform #3
.......XO  Box: #  8, Transform #3
......O.X  Box: #  9, Transform #6
......OX.  Box: #  8, Transform #6
......X.O  Box: #  9, Transform #3
......XO.  Box: # 13, Transform #6
.....O..X  Box: # 13, Transform #8
.....O.X.  Box: #  2, Transform #8
.....OX..  Box: #  4, Transform #8
.....X..O  Box: #  8, Transform #8
.....X.O.  Box: #  2, Transform #3
.....XO..  Box: # 11, Transform #6
....O...X  Box: #  6, Transform #3
....O..X.  Box: #  7, Transform #3
....O.X..  Box: #  6, Transform #4
....OX...  Box: #  7, Transform #2      <== this one
....X...O  Box: # 10, Transform #3
....X..O.  Box: #  3, Transform #3
....X.O..  Box: # 10, Transform #4
....XO...  Box: #  3, Transform #2
...O....X  Box: #  4, Transform #4
...O...X.  Box: #  2, Transform #4
...O..X..  Box: # 13, Transform #4
...O.X...  Box: #  5, Transform #4
...OX....  Box: #  3, Transform #4
...X....O  Box: # 11, Transform #3
...X...O.  Box: #  2, Transform #6
...X..O..  Box: #  8, Transform #4
...X.O...  Box: #  5, Transform #2
...XO....  Box: #  7, Transform #4
..O.....X  Box: #  9, Transform #2
..O....X.  Box: # 11, Transform #2
..O...X..  Box: # 12, Transform #2
..O..X...  Box: #  8, Transform #2
..O.X....  Box: # 10, Transform #2
..OX.....  Box: # 11, Transform #5
..X.....O  Box: #  9, Transform #8
..X....O.  Box: #  4, Transform #3
..X...O..  Box: # 12, Transform #4
..X..O...  Box: # 13, Transform #2
..X.O....  Box: #  6, Transform #2
..XO.....  Box: #  4, Transform #7
.O......X  Box: #  4, Transform #5
.O.....X.  Box: #  5, Transform #1
.O....X..  Box: #  4, Transform #1
.O...X...  Box: #  2, Transform #5
.O..X....  Box: #  3, Transform #1
.O.X.....  Box: #  2, Transform #1
.OX......  Box: # 13, Transform #5
.X......O  Box: # 11, Transform #8
.X.....O.  Box: #  5, Transform #3
.X....O..  Box: # 11, Transform #4
.X...O...  Box: #  2, Transform #2
.X..O....  Box: #  7, Transform #1
.X.O.....  Box: #  2, Transform #7
.XO......  Box: #  8, Transform #5
O.......X  Box: # 12, Transform #1
O......X.  Box: # 11, Transform #7
O.....X..  Box: #  9, Transform #7
O....X...  Box: # 11, Transform #1
O...X....  Box: # 10, Transform #1
O..X.....  Box: #  8, Transform #7
O.X......  Box: #  9, Transform #1
OX.......  Box: #  8, Transform #1
X.......O  Box: # 12, Transform #3
X......O.  Box: #  4, Transform #6
X.....O..  Box: #  9, Transform #4
X....O...  Box: #  4, Transform #2
X...O....  Box: #  6, Transform #1
X..O.....  Box: # 13, Transform #7
X.O......  Box: #  9, Transform #5
XO.......  Box: # 13, Transform #1

This tells Alan that the move given to him by Donald is to be played by matchbox #7, and then he is to use Transform #2, which we saw above, to interpret the color of the drawn move as to which square is meant.  We can see what position box #7 corresponds to above, though Alan does not know that. He simply reaches into box #7 and pulls out a random bead. As it happens, in my simulation of MENACE where it never tries to play two essentially the same moves, the only beads in #7 are colored white, black, amber, and red, corresponding to essentially different moves down the left column and in the bottom at the middle using the original MENACE bead color interpretations.  Under Transform #2 we see that those colors correspond to squares 3, 2, 1, and 4, respectively, which are across the top row and the left middle square for the way Donald is playing. So whichever one of those colors is removed from the box, Alan simply goes to that position in the string that was given to him by Donald, and changes the blank to an ‘O’. So suppose that Alan pulls out a black bead. In that case he changes the second element to an ‘O’, and gives it back to Donald who then interprets the string to mean the following new board position:

                .O.
.O..OX...       .OX
                ...

The only remaining thing is the reinforcement signal. Donald, the human player, is the one who is responsible for deciding when the game is over and at that point needs to communicate one of just three options to Alan; L, for loss, meaning forfeit all the beads out of boxes, D, for draw, meaning put the beads back with an extra one of the same color for each, or W, for win, meaning put them back with three extra ones of each.

Summary of What Alan Must Do

With these modifications we have made the job of Alan both incredibly simple and incredibly regimented.

    1. When Donald gives Alan a string of nine characters Alan looks it up in a table, noting the matchbox number and transform number.
      1. He opens the numbered matchbox, randomly picks a bead from it and leaves it on the table in front of the open matchbox.
      2. He looks up the color of the bead in the numbered transform, to get a number between one and nine.
      3. He replaces that numbered character in the string with an ‘O’, and hands the string back to Donald.
    2. When Donald gives Alan a sign for one of L, D, or W, Alan does the following:
      1. For L he removes the beads on the table and closes the open matchboxes.
      2. For D he adds one more bead of the same color to each one on the table, and puts the pairs in the matchboxes behind them, and closes the matchboxes.
      3. For W he adds three more beads of the same color to each one on the table, and puts the sets of four in the matchboxes behind them, and closes the matchboxes.

That is all there is. Alan looks up things on a few sheets of paper, acts on matchboxes, and changes a character in a string.

One could say that Alan is a Turing machine.

The thing that learns how to play tic-tac-toe is a combination of Alan following these completely strict rules, and the contents of the matchboxes, the colored beads, whose number varies over time.

Is this how a person would learn?

For anyone who has played tic-tac-toe the most striking thing about about the way that MENACE learns is that it has no concept of “three in a row”. When we teach a child how to play the game that is the most important thing to explain, showing how rows, columns, and diagonals can all give rise to three O’s or X’s in a row. We explain to the child that getting three in a row is the goal of the game. So the first rule for playing tic-tac-toe is to complete three in a row on your move if that option is available. MENACE does not know this at all.

The next thing, or second rule, we might show our tic-tac-toe pupil is that assuming they have no winning move, the next best thing is to block the opponent if they have two of three in a row already with an empty spot to play and complete it. This does not guarantee an eventual win, as there are seventeen essentially different situations where the ‘X’ player may have two three-in-a-row’s ready to play, and the ‘O’ player can only block one of them.  Here are two examples of that.

.O.   XOO
OOX   O..
.XX   .XX

However just these two rules are a marked improvement over random play.  If we play tic-tac-toe with the preference of rule 1 if it is applicable, then rule 2 if that is applicable, and if neither is applicable then make a random move, we actually get a pretty good player. Here is the same sort of table as above, with an identical first row showing how well random untrained play succeeds against Players A, B, and C, then in the second row how well the addition of rules 1 and 2 improve a random player

            | Player A    | Player B    | Player C    |
------------|-------------|-------------|-------------|
random play |  59/ 13/ 28 |   0/ 24/ 76 |  27/ 19/ 53 |
============|=============|=============|=============|
+ rules 1&2 |  86/  10/ 4 |   0/ 82/ 18 |  51/ 37/ 13 |
------------|-------------|-------------|-------------|

Just the addition of those two rules gets to a level of play against a random player (Player A) that MENACE only gets to after about 4,000 games learning from Player A. Against Player B, the optimal player, it does not get as good as it does when it is trained for 200 games by Player B, but it is better against either of the other two players than when it has been been trained for 4,000 games against Player B. And against Player C, the player with 25% error rate from optimal play, it is almost as good as it ever gets even being trained by Player C.

Clearly these rules are very powerful. But they are also rather easy for a child to learn as they don’t require thinking ahead beyond the very next move. All the information is right there in the board layout, and there is no need to think ahead about what the opponent might do next once the current move is made. What is it that the child has that MENACE does not?

One answer might be geometrical representations. MENACE does not have any way to represent “three in a row” as a concept that can be applied to different situations. Each matchbox is a kingdom unto itself about one particular essentially unique board configuration. If one particular matchbox learns, through reinforcement, that it is good to place a third ‘O’ to make a diagonal, there is no way to transfer that insight, were MENACE able to have it, to other essentially different situations where there is also a diagonal that can be filled in. And certainly not to a situation about completing a horizontal or vertical row.

As we saw, Michie did incorporate some geometric “knowledge” into MENACE by mapping all rotations and reflections of the tic-tac-toe board to a common matchbox. But the machine itself has no insight into this–it was all done ahead of time by Michie (whose preparation was extended slightly by me so that Alan could be very explicitly machine-like in his tasks) by producing the dictionary of positions that mapped to matchboxes number 1 through 304, and which of the eight inversion lookup tables that mapped from color of bead to numbered square on the board should be used. That manual design process handled some mappings between different aspects of three in a row but not all. In general a researcher or engineer using Machine Learning to solve a problem does something very similar, in reducing the space of inputs. The art of it is to reduce the input space so that learning can happen more quickly, but not over reduce the space so that subtle differences in situations are obliterated by the pre-processing. By mapping from all the general board positions to precisely those that are essentially different, Donald Michie, the Machine Learning engineer in this case, managed so satisfy both those goals.

A child knows something about geometry in a way that MENACE does not. A child can talk about things being in a row independently of learning tic-tac-toe. A child has learned that in-a-row-ness is independent of orientation of the line the defines the row. By a certain age a child comes to know that the left-to-rightness of some ordering depends on the point of view of the observer, so they are able to see that two in a row with an empty third one is an important generalization that applies equally to the horizontal and vertical rows around the edges, thinking about them in both directions, and also applies to the horizontal and vertical rows that go through the middle square, and to the two diagonals that also go through that square. The child may or may not generalize that to two at each end of a row with the middle to be filled in–perhaps that might be a different concept for young children. But the rowness of things is something they have a lot of experience with, and are able to apply to tic-tac-toe. In computer science we would talk about rowness being a first class object for a child–something that can be manipulated by other programs, or in a child by many cognitive systems. In MENACE rowness is hidden in the pre-analysis of the problem that Donald Michie did in order to map tic-tac-toe to collection of numbered matchboxes with beads in them.

The learning that MENACE does somehow feels different to the learning that human does when playing tic-tac-toe. That is not to say that all learning that a human does is necessarily completely different from what MENACE does. Perhaps things that humans learn in an unconscious fashion (e.g., how to adjust their stance to stay balanced–negative and positive reinforcement signals based on whether they hit the ground or not), where we have no way to access what is happening inside us, nor an ability to talk about it, is more like MENACE learning.

Not all learning is necessarily the same sort of learning.

Is this how a person would play?

A more fundamental question, perhaps, is whether MENACE plays tic-tac-toe like a person does, and I think the answer is a clear no. The MENACE system consisting of the matchboxes and Alan strictly following rules only fills in part of the role of a normal player. The rest of what is usually a social interaction between two people is all taken on by Donald.

There is no representation inside the MENACE (where we include in the definition of MENACE the sheets of papers that Alan consults, and the rules that we have instructed Alan to strictly follow) of tic-tac-toe being a game that is played. MENACE does not know what a game is, or even that it is playing a game. All that happens inside MENACE is that one at a time, either three or four times sequentially, one of its matchbox drawers is opened and a bead is randomly removed, and then either the beads are taken away, or they are put back in the boxes from where they came with either one or three additional beads of the same color, and the boxes are closed.

All the gameness of tic-tac-toe is handled by the human Donald. It is he who initiates the game by handing Alan a string of nine periods. It is he who manages the consistency of subsequent turns by annotating his hand drawn tic-tac-toe board with the moves. It is he who decides when the game has been won, drawn, or lost, and communicates to Alan the reinforcement signal that is to be applied to the open matchboxes. It is he, Donald, who decides whether and when to initiate a new game.

MENACE does not know, nor does it learn, what a game is. The designer of MENACE abstracted that away from the situation, so that MENACE could be a pure learning machine.

That today is both the strength and weakness of modern Machine Learning. Really smart people, researchers or engineers, come up with an abstraction for the problem in the real world that they want to apply ML to. Those same smart people figure out how data should flow to and fro between the learning system and the world to which it is to be applied. They set up machinery which times and gates that information flow from the application. They set up a signaling system on when the learning system is supposed to respond to an input (in MENACE’s case a string of nine characters drawn from ‘.’, ‘X’, and ‘O’) and produce an output. And those same people set up a system which tells the learning system when to learn, to adjust the numbers inside it, in response to a reinforcement signal, or in some other forms of ML a very different, but still similarly abstracted signal–we will see that in the next chapter.

tic-tac-toe machine resonates with modern ML

Although MENACE is well over fifty years old, it it shares many properties with modern Machine Learning systems, though of course it is much smaller and simpler than the systems that people use today–one must expect something from 50+ years of hard intellectual work. But the essential problems that MENACE and today’s ML algorithms have are very instructive as they can give some intuition about some of the limits we might expect for modern AI and ML.

Parameters. After the design work was done on MENACE, all that could change during learning as the value of the 1087 parameters, the numbers of various colored beads in various matchboxes. Those numbers impact the probability of randomly picking a bead of a particular color from a matchbox. If the number of red beads goes down and the number of amber beads goes up over time in a single matchbox, then it is more likely that Alan will pick an amber bead at random.  In this way MENACE has learned that for the particular situation on a tic-tac-toe board corresponding to that matchbox the square corresponding to the amber bead is a better square to play than the one corresponding to a red bead. All MEANCE is doing is juggling these numbers up and down. It does not learn any new structure to the problem while it learns. The structure was designed by a researcher or engineer, in this case Donald Michie.

This is completely consistent with most modern Machine Learning systems. The researchers or engineers structure the system and all that can change during learning is a fixed quantity of numbers or parameters, pushing them up or down, but not changing the structure of the system at all. 1087 may seem like a lot of parameters for playing tic-tac-toe, but really that is the price of eliminating the geometry of the board from the MENACE machine.  In modern applications of Machine Learning there are often many millions of parameters. Sometimes they take on integer values as do the number of beads in MENACE, but more usually these days the parameters are represented as floating point numbers in computers, things that can take on values like 5.37, -201.65, 894.78253, etc.

Notice how simply changing a big bunch of numbers and not changing the underlying abstraction that connected the external problem (playing tic-tac-toe) to a geometry-free internal representation (the numbers of different colored beads in matchboxes) is very different from how we have become familiar with using computers. When we manage our mail box folders, creating special folders for particular categories (e.g., “upcoming trips”, “kids”, etc.) and then sub folders (e.g., “Chicago May 5”, “soccer”, etc.) and then filing emails in those subfolders, we are changing the structure of our representation of the important things in our life which are covered by emails. Machine Learning, as in the case of MENACE, usually has an engineering phase were the problem is converted to a large number of parameters, and after that there is no dynamic updating of structures.

In contrast, I think all our intuitions tell us that our own learning often has our internal mental models tweak and sometimes even radically change how we are categorizing aspects of the skill or capability that we are learning.

Large Parameters. My computer simulations of MENACE soon had the numbers of beads of a particular color in particular boxes ranging from none or one up to many thousand. This intuitively seem strange but is not uncommon in today’s Machine Learning systems. Sometimes there will be parameters that are between zero and one, were just a change of one ten thousandth in value will have drastic effects on the capabilities that the system is learning, while at the same time there will be parameters that are up in the millions. There is nothing wrong with this, but it does feel a little different from our own introspections of how we might weigh things relatively in our own minds.

Many Examples Needed. If we taught tic-tac-toe to an adult we would think that just a few examples would let them get the hang of the game. MENACE on the other hand, even when carefully tutored by Donald Michie took a couple of hundred examples to get moderately good. My simulation is still making relatively big progress after three thousand games and is often still slowly getting even a little better at four thousand games. In modern Machine Learning systems there may be tens of millions of different examples that are needed to train a particular system to get to adequate performance. But the system does not just get exposed to each of these training examples once. Often each of those millions of examples needs to be shown to the system hundreds of thousands of times. Just being exposed to the examples once leaves way to much bias from the most recently processed examples. Instead by having them re-exposed over and over, after the ML system has already seen all of them many times, the recentness bias gets washed away into more equal influence from all the examples.

Training examples are really important. Learning to play against just one of Player A, B, or C, always lead to very different performance levels against each of these different players with learning turned off in my computer simulation of MENACE.  This too is a huge issue for modern Machine Learning systems. With millions of examples needed there is a often a scale issue of how to collect enough training data.  In the last couple of years companies have sprung up which specialize in generating training data sets and can be hired for specific projects.  But getting a good data set which does not have unexpected biases in it can often be a problem.

When MENACE is trained against Player B, the optimal player that can not be beaten, MENACE does not learn how to win, as it never has an experience of winning so it never receives reinforcement for winning. It does learn how to not be defeated, and so playing against Players A or C its win rate does go up a little as they each sometimes screw up, but MENACE’s winning rate does not go up as much as it does when it trains against those two players. In our example with MENACE my simulations worked best overall when trained against Player C, as that had a mixture of examples that  were tough to win against (when it got through a game without making a random bad choice), and because of its occasional random choices examples which more fully spanned all of the possible playing styles MENACE might meet. In the parlance of Machine Learning we would say that when MENACE was trained only against Player B, the optimal player, it overfit its playing style to the relatively small number of games that it saw (no wins, and few losses) so was not capable when playing against more diverse players.

In general, the more complex the problem for which Machine Learning is to be used, the more training data that will be needed.  In general, training data sets are a big resource consideration in building a Machine Learning system to solve a problem.

Credit assignment. The particular form of learning that MENACE both first introduced and demonstrates is reinforcement learning, where the system is given feedback only once it has completed a task. If many actions were taken in a row, as is the case with MENACE, either three of four moves of its own before it gets any feedback, then there is the issue of how far back the feedback should be used.

In the original MENACE all three forms of reinforcement, for a win, a draw, or a loss, were equally applied to all the moves. Certainly it makes sense to apply the reinforcement to the last move, as it directly did lead to that win, or a loss. In the case of a draw however, it could in some circumstances not be the best move as perhaps choosing another move would have given a direct win. As we move backward, credit for whether earlier moves were best, worst, or indifferent is a little less certain. In the case of Player A or C as the opponent it may have simply made a bad move in reply to a bad move by MENACE early on, so giving the earlier move three beads for a win may be encouraging something that Player  B, the optimal player, will be able to crush. A natural modification would be three beads for the last move in a winning game, two beads for the next to last, and one bead for the third to last move.  Of course people have tried all these variations and under different circumstances much more complex schemes would be the best. We will discuss this more, a little later.

In modern reinforcement learning systems a big part of the design is how credit is assigned. In fact now it is often the case that the credit assignment itself is also something that is learned by a parallel learning algorithm, trying to optimize the policy based on the particulars of the environment in which the reinforcement learner finds itself.

Getting front end processing right. In MENACE Michie developed what might be called “front end processing” to map all board positions to only those that were essentially distinct. This simultaneously drastically cut down the number of parameters that had to be learned, let the learning system automatically transfer learning across different cases in the full world (i.e., across symmetries in the tic-tac-toe board), and introduced zero entanglements that could confuse the learning process.

Up until a few years ago Machine Learning systems applied to understanding human speech usually had as their front end programs that had been written by people to determine the fundamental units of speech that were in sound being listened to.

Those fundamental units of speech are called phonemes, and they can be very different for different human languages. Different units of speech lead to different words being heard. For instance, the four English words pad, pat, bad, and bat all have three phonemes with the same middle phoneme corresponding to the vowel sound (in English the same letters may be used represent to different phonemes, so the word paper, while having the same letter ‘a’ for the second phoneme (of four in this word) has a very different sound associated with it, and is therefore a different phoneme), the four different phonemes p, b, d, and t, lead to four different words being heard as p and b are varied at the start, and d and t are varied at the end.

In earlier speech understanding systems the specially built front end phoneme detector programs relied on some numerical estimators of certain frequency characteristics of the sounds and produced phoneme labels as their output that were fed into the Machine Learning system to recognize the speech. It turned out that those detectors were limiting the performance of the speech understanding systems no matter how well they learned. Relatively recently those programs were replaced by other machine learning system, that didn’t necessarily output conventional phoneme representations, and this lead to a remarkable overall increase in reliability of speech understanding systems. This is why today, but only in the last few years, many people now have devices in their homes, such as Amazon’s Echo or Google’s Home, that they can easily interact with via voice.

Getting the front end processing right for an ML problem is a major design exercise. Getting it wrong can lead to much larger learning systems than necessary, making learning slower, perhaps impossibly slower, or it can make the learning problem impossible if it destroys vital information from the real domain. Unfortunately, since in general it is not known whether a particular problem will be amenable to a particular Machine Learning technique, it is often hard to debug where things have gone wrong when an ML system does not perform well.  Perhaps inherently the technique being used will not be able to learn what is desired, or perhaps the front end processing is getting in the way of success.

Geometry is hard. Just as MENACE knew no geometry and so tackled tic-tac-toe in a fundamentally different way than how a human would approach it, most Machine Learning systems are not very good at preserving geometry nor therefore are they good at exploiting it. Geometry does not play a role in speech processing, but for many other sorts of tasks there is some inherent value to the geometry of the input data. The engineers or researchers building the front end processing for the system need to find a way to accommodate the poor geometric performance of the ML system being used.

The issue of geometry and the limitations of representing it in a set of numeric parameters arranged in some fixed system, as was the case in MENACE, has long been recognized. It was the major negative result of the book Perceptrons⁠11 written by Marvin Minsky and Seymour Papert in 1969. While people have attributed all sorts of motivations to the authors I think that their insights on this front, formally proved in the limited cases they consider, still ring true today.

Fixed structure stymies generalization. MENACE’s fixed structure meant that anything it implicitly learned about filling or blocking three in a row on a diagonal could not be transferred to filling or blocking a vertical or horizontal row. The fixed structures spanning thousands or millions of variable numerical parameters of most Machine Learning systems likewise stymies generalization. We will see some surprising consequences of this when we look at some of the most recent exciting results in Machine Learning in a later blog post–programs that learn to play a video game but then fail completely and revert to zero capability on exactly the same game when the colors of pixels are mapped to different colorations, or if each individual pixel is replaced by a square of four identical pixels.

Furthermore, any sort of meta-learning is usually impossible too. Since MENACE doesn’t know that it is playing a game, and since there is nothing besides the play and reward mechanism that can access the matchboxes, there is no way that observations of the flow of a game can be ruminated upon. A child might learn a valuable meta-lesson in playing tic-tac-toe, that when you have an opportunity to win take it immediately as it might go away if the other player gets to take a turn. That would correspond to learning rule 1 in our comparison between MENACE and how a person might learn.

Machine Learning engineers and researchers must, at this point in the history of AI, form an optimized and fixed description of the problem and let ML adjust parameters. All possibility of reflective learning is removed from these very impressive learning systems. This greatly restricts how much power of intelligence and AI system with current day Machine Learning systems can tease out of their learning exploits. Humans are generally much much smarter than this.

A Few Developments in Reinforcement Learning

The description of reinforcement learning comes from 1961, and is the first use of the term reinforcement learning when applied to a machine process that I can find. There have been some developments in reinforcement learning since 1961, but only in details as this section shows. The fundamental ideas were all there in Donald Michie’s matchboxes.

Reinforcement learning is still an active field of research and application today. It is commonly used in robotics applications, and for playing games. It was part of the system that beat the world Go champion in 2016, but we will come back to that in a little bit.

After Michie’s first paper, reinforcement learning was formalized over the next twenty years. Without resorting to the mathematical formulation, today reinforcement learning is used where there are a finite number of states that the world can be in.  For MENACE those states correspond to the 304 matchboxes of essentially different tic-tac-toe board positions where it is O’s turn to play. For each state there are a number of possible actions (the different colored beads in each matchbox corresponding to the possible moves). The policy that the system currently has is the probability of each action in each state, which for MENACE corresponds to the number of beads of a particular color in a matchbox divided by the total number of beads in that same matchbox. Reinforcement learning tries to learn a good policy.

The structure of states and actions for MENACE, and indeed for reinforcement learning for many games, is a special case, in that the system can never return to a state once it has left it. That would not be the case for chess or Go where it is possible to get back to exactly the same board position that has already been seen.

For many systems of reinforcement learning real numbers are used rather than integers as in MENACE. In some cases they are probabilities, and for a given state they must sum to exactly one. For many large reinforcement learning problems, rather than represent the policy explicitly for each state, it is represented as a function approximated by some other sort of learning system such as a neural network, or a deep learning network. The steps in the reinforcement process are the same, but rather than changing values in a big table of states and actions, the 1087 parameters of MENACE, a learning update is given to another learning system.

MENACE, and many other game playing systems, including chess and Go this time, are a special case of reinforcement learning in another way. The learning system can see the state of the world exactly. In many robotics problems where reinforcement learning is used that is not the case. There the robot may have sensors which can not distinguish all the nuances in the world (e.g., for a flying robot it may not know the exact current wind speed and direction ten meters away from it in the direction of travel). For these sorts of reinforcement learning problems the world is referred to as partially observable.

in MENACE any rewards, be they positive or negative were spread equally over all moves leading up the win, loss, or draw. But in reality it could be that an early move was good, and just a dumb move at the end was bad. To handle this problem Christoper Watkins came up with a method that became known as Q-learning for his Ph.D. thesis12, titled “Learning from Delayed Rewards”, at Cambridge University in 1989. The Q function that he learns is an estimate of what the ultimate reward will be by taking a particular action in a particular state. Three years later he and Peter Dayan published a paper that proved that under some reasonable assumptions his algorithm always eventually converged on the correct answer as to how the reward should be distributed.

This method, which is at its heart the reinforcement learning of Donald Michie’s MENACE from 1961, is what is powering some of today’s headlines. The London company DeepMind, which was bought by Google, uses reinforcement learning (as they explain here) with the Q-learning implemented in something called deep learning (another popular headline topic). This is how they built their Alpha Go program which recently beat both the human Korean and Chinese Go champions.

As a side note, when I visited DeepMind in June this year I asked how well their program would have done if on the day of the tournament the board size had been changed from 19 by 19 to 29 by 29. I estimated that the human champions would have been able to adapt and still play well. My DeepMind hosts laughed and said that even changing to an 18 by 18 board would have completely wiped out their program…this is rather consistent with what we have observed about MENACE. Alpha Go plays Go in a way that is very different from how humans apparatently play Go.

Overloaded words

In English, at least, ships do not swim. Ships cruise or sail, whereas fish and humans swim. However in English planes fly, as do birds. By extension people often fly when they go on vacation or on a business trip. Birds move from one place to another by traveling through the air. These days, so too can people.

But really people do not fly at all like birds fly. Our technology lets us “fly” a quarter of the way around the world, non-stop, in less than a day. Birds who can fly that far non-stop (and there are some) certainly take a lot longer than a day to do that.

If humans could fly like birds we would think nothing of chatting to a friend on the street on a sunny day, and as they walk away, flying up into a nearby tree, landing on a branch, and being completely out of the sun. If I could fly like a bird then when on my morning run I would not have to wait for a bridge to get across the Charles River to get back home, but could choose to just fly across it at any point in its meander.

We do fly. We do not fly like birds. Human flying is very different in scope, in method, and in auxiliary equipment beyond our own bodies.

Arthur Samuel introduced the term Machine Learning for two sorts of things his computer program was doing as it got better and better over time at and through the experience of playing checkers. A person who got better and better over time at and through the experience of playing checkers would certainly be said to be learning to be a better player. With only eight to ten hours experience Samuel’s program (he was so early at this he did not give a name to his program–that innovation had to away the early 1960’s) got better at playing checkers than Samuel himself. Thus, in his first sentence of his paper, again, does Samuel justify the term learning: “The studies reported here have been concerned with programming of a digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning.”

What I have tried to do in this post is to show how Machine Learning works, and to provide an argument that it works in a way that feels very different to how human learning of similar tasks proceeds. Thus, taking an understanding of what it is like for a human to learn something and applying that knowledge to an AI system that is doing Machine Learning may lead to very incorrect conclusions about the capabilities of that AI system.

Minsky13 labels as suitcase words terms like consciousness, experience, and thinking. These are words that have so many different meanings that people can understand different things by them. I think that learning is also a suitcase word. Even for humans it surely refers to many different sorts of phenomena. Learning to ride a bicycle is a very different experience from learning ancient Latin. And there seems to be very little in common in the experience of learning algebra and learning to play tennis. So, too, is Machine Learning very different from any sort of the myriad of different learning capabilities of a person.

The word “learn” can lead to misleading conclusions.

Postscript

I am going to indulge myself a little by pontificating here. Be warned.

In 1991 I wrote a long (I have been pontificating since I was relatively young) paper14  on the history of Artificial Intelligence and how it had been shaped by certain key ideas. In the final paragraphs of that paper I lamented that there was a bandwagon effect in Artificial Intelligence Research, and said that “[m]any lines of research have become goals of pursuit in their own right, with little recall of the reasons for pursuing those lines”.

I think we are in that same position today in regard to Machine Learning. The papers in conferences fall into two categories. One is mathematical results showing that yet another slight variation of a technique is optimal under some carefully constrained definition of optimality. A second type of paper takes a well know learning algorithm, and some new problem area, designs the mapping from the problem to a data representation (e.g., the mapping from tic-tac-toe board positions to the numbers 1 through 304 for the three hundred and four matchboxes that comprise MENACE), and show the results of how well that problem area can be learned.

This would all be admirable if our Machine Learning ecosystem covered even a tiny portion of the capabilities of human learning. It does not. And, I see no alternate evidence of admirability.

Instead I see a bandwagon today, where vast numbers of new recruits to AI/ML have jumped aboard after recent successes of Machine Learning, and are running with particular versions of it as fast as they can. They have neither any understanding of how their tiny little narrow technical field fits into a bigger picture of intelligent systems, nor do they care. They think that the current little hype niche is all that matters, are blind to its limitations, and are uninterested in deeper questions.

I recommend reading Christopher Watkins Ph.D. thesis12 for an example of something that is admirable. It revitalized reinforcement learning by introducing Q-learning, and that is still having impact today, thirty years later. But more importantly most of the thesis is not about the particular algorithm or proofs about how well it works under some newly defined metric. Instead, most of the thesis is an illuminating discussion about animal and human learning, and attempting to get lessons from there about how to design a new learning algorithm. And then he does it.



1 Machine Learning: A Probabilistic Perspective, Kevin P. Murphy, MIT Press, 2012.

2 “Some Studies in Machine Learning Using the Game of Checkers”, Arthur L. Samuel, IBM Journal of Research and Development, 3(3):210–229, 1959.

3 When I first joined the Stanford Artificial Intelligence Laboratory (SAIL) in 1977 I got to meet Arthur Samuel. Born in 1901 he was certainly the oldest person in the lab at that time. After retiring from IBM in 1966 he had come to SAIL as a researcher. Arthur was a delightful and generous person, and besides his research he worked on systems programming in assembler language for the Lab’s time shared computer. He was the principal author of the full screen editor (a rarity at that time) that we had, called Edit TV, or ET at the command level. He was still programming at age 85, and last logged in to the computer system when he was 88, a few months before he passed away.

4 Perhaps I am wrong about exactly what Samuel was referring to. In his Ph.D. thesis12, which I talk about later in the post, Christopher Watkins allows that perhaps Samuel means what I interpret him to mean, though perhaps there is a smarter version of it that was implemented that involved recomputing the saved computations when more of the game tree had been searched. Watkins was unable to tell exactly from reading the paper.

5 “Trial and Error”, Donald Michie, Penguin Science Survey, vol 2, 1961.

6 “How to build a game-learning machine and then teach it to play, and to win”, Martin Gardner, Scientific American, 206(3):138–153, March 1962.

7 We Built Our Own Computers, A. B. Bolt, J. C. Harcourt, J. Hunter, C. T. S. Mayes, A. P. Milne, R. H Surcombe, and D. A. Hobbs, Cambridge University Press, 1966.

8 “Experiments on the Mechanization of Game-Learning Part I. Characterization of the Mode and its parameters”, Donald Michie, Computer Journal, 6(3):232–236, 1963.

9 Michie reports only 287 essentially different situations so his version of MENACE had only 287 matchboxes (though in a 1986 paper he refers to there being 288 matchboxes). Many people have since built copies of MENACE both physically and in computer simulations, and all the ones that I have found on the web report 304 matchboxes, virtual or otherwise. This matches how I counted them in my simulation of MENACE as a program.

10 In all the test results I give I froze the learning and ran 100,000 games–I found that about that number were necessary to give 2 digits, i.e., a percentage, that was stable for different such trials. Note that in total there are 301,248 different legal ways to play out a game of tic-tac-toe. If we consider only essentially different situations by eliminating rotational and reflective symmetries then that number drops to 31,698.

11 Perceptrons: An introduction to Computational Geometry, Marvin Minsky and Seymour Papert, MIT Press, 1968.

12 Learning from Delayed Rewards, Christopher J. C. H. Watkins, Ph.D. thesis, King’s College, Cambridge University, May 1989.

13 The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind, Marvin Minsky, Simon and Schuster, 2006.

14 “Intelligence Without Reason, Proceedings of 12th International Joint Conference on Artificial Intelligence, Sydney, Australia, August 1991, 569–595.

[FoR&AI] Domo Arigato Mr. Roboto

rodneybrooks.com/forai-domo-arigato-mr-roboto/

[An essay in my series on the Future of Robotics and Artificial Intelligence.]

Friday March 11th, 2011, was a bad day for Japan. At 2:46pm local time a magnitude 9.1 earthquake occurred 72 kilometers offshore, east of the Oshika Peninsula which is in the Tohoku region of Japan. A great tsunami was triggered with maximum wave height believed to be 42.5 meters (133 feet) and a few minutes after the earthquake it hit the town of Miyako, 432 kilometers (300 miles) north of Tokyo. Hundreds of kilometers of the coastal region was devastated with almost 16,000 deaths, over 2,500 people missing, and three quarters of a million buildings either collapsed, partially collapsed, or were severely damaged.

The following week things got worse. Japan has been forever changed by what happened in March and April of that year.

A little before 8am on Friday April 25th, 2014, I met up with a small number of robotics researchers from the United States in the Ueno train station in Tokyo. It was a somber rendezvous, but I did not yet realize the sobering emotions I would feel later in the day.

As a technologist I have had more than my fair share of what I think of as “science fiction” days, most of them quite uplifting and exciting. Science fiction days for me are days where I get to experience for real something that heretofore most people have only ever experienced by watching a movie. For instance on July 4, 1997, I was at JPL (the Jet Propulsion Laboratory in Pasadena, California) watching live images come in from the surface of Mars soon after the soft landing of the Pathfinder mission. A little later in the afternoon, to hearty cheers, the Sojourner robot rover deployed onto the surface of Mars, the first mobile ambassador from Earth. Dan Goldin, the administrator of NASA, congratulated all the JPL technologists on the first “faster, cheaper, better” mission. That phrase was a cleaned up version of a title of a paper⁠1 I had written in 1989 with Anita Flynn: “Fast, Cheap, and Out of Control: A Robot Invasion of the Solar System”, where we had proposed the idea of small rovers to explore planets, and explicitly Mars, rather than large ones that were under development at that time. The rover that landed in 1997 was descended from a project at JPL that Colin Angle, then a brand new graduate of M.I.T., and I had helped get started that same year, 1989. The day of the landing was a great science fiction day, and it was related to the one I was about to experience almost seventeen years later.

Really though, April 25th, 2014 was for me two science fiction days rolled into one. Both of them were dystopian.

The group that formed up in Ueno station was lead by Gill Pratt. Gill had been a faculty member in the M.I.T. Artificial Intelligence Laboratory when I had been its director in the late 1990s. He had lead the “leg laboratory”, within the AI Lab, working on making robots that could walk and run. Now he was a program manager at DARPA, the Defense Advanced Research Projects Agency, part of the US Defense Department, leading the DARPA Robot Challenge, a competition whose final was to be held the next year to push forward how robots could help in disaster situations. We robotics researchers were in Japan that week to take part in a joint US/Japan robotics workshop that was held as a satellite event for a summit in Tokyo between Prime Minister Abe and President Obama.

On that Friday morning we took an express train to Iwaki, and from there a fifty minute minibus ride to the “J-village”.  Now things started to get a little surreal. J.League is the Japan Professional Football League, and the J-village was, until the earthquake and tsunami, the central training facility for that league, with multiple soccer pitches, living quarters, a gym, swimming pool, and large administrative buildings. Now three of the pitches were covered in cars, commuter lots for clean up crews. Trucks and minibuses coming from the north were getting scanned for radiation, while all northbound traffic had to go through security gates from the soccer facility. The J-village was now the headquarters of the operation to deal with the radiation released from the Fukushima Daiichi nuclear power plant, when the tsunami had hit it, ultimately leading to three of its six reactors melting down. The J-village was right on the border of a 20 kilometer radius exclusion zone established around that plant, and was being operated by TEPCO, the Tokyo Electric Power Company which owned Fukushima Daiichi, along with Fukushima Daini, also in the exclusion zone, whose four reactors were able to be shut down safely without significant damage.

Inside the main building the walls signaled professional soccer, decorated with three meter high images of Japanese stars of the game. But everything else looked makeshift and temporary. We were met by executives from TEPCO and received our first apology from them for their failures at Daiichi right after the tsunami. We would receive more apologies during the day. This was clearly a ritual for all visitors as none of use felt we were owed any sort of apology. As had happened the day before in a meeting with a government minister, and again rather embarrassingly, I was singled out for special thanks.

After Colin Angle and I had helped get the small rover program at JPL going, where it was led by David Miller and Rajiv Desai, we got impatient about getting robots to other places in the solar system. So, joined by Helen Greiner a friend of Colin’s and for whom I had been graduate counsellor at M.I.T., we started a space exploration robot company originally called IS Robotics. In a 2002 book⁠2 I told the story of our early adventures with that company and how our micro-rovers being tested at Edwards Air Force Base as a potential passenger on a Ballistic Missile Defense Organization (BMDO, popularly known as “Star Wars”) mission to the Moon forced NASA’s hand into adding the Sojourner rover to the Pathfinder mission. By 2001 our company had been renamed to be iRobot, and on the morning of September 11 of that year we got a call to send robots to ground zero in New York City. Those robots scoured nearby evacuated buildings for any injured survivors that might still be trapped inside. That led the way for our Packbot robots to be deployed in the thousands in Afghanistan and Iraq searching for nuclear materials in radioactive environments, and dealing with road side bombs by the tens of thousands. By 2011 we had almost ten years of operational experience with thousands of robots in harsh war time conditions.

A week after the tsunami, on March 18th 2011, when I was still on the board of iRobot, we got word that perhaps our robots could be helpful at Fukushima. We rushed six robots to Japan, donating them, and not worrying about ever getting reimbursed–we knew the robots were on a one way trip. Once they were sent into the reactor buildings they would be too contaminated to ever come back to us.  We sent people from iRobot to train TEPCO staff on how to use the robots, and they were soon deployed even before the reactors had all been shut down.

The oldest of the reactors had been operating for 40 years, and the others shared the same design. None of them had digital monitoring installed, so as they overheated and explosions occurred and they released high levels of radiation there was no way to know what was going on inside the reactor buildings.  The four smaller robots that iRobot sent, the Packbot 510, wieighing 18kg (40 pounds) each with a long arm, were able to open access doors, enter, and send back images. Sometimes they needed to work in pairs so that the one furtherest away from the human operators could send back signals via an intermediate robot acting as a wifi relay. The robots were able to send images of analog dials so that the operators could read pressures in certain systems, they were able to send images of pipes to show which ones were still intact, and they were able to send back radiation levels. Satoshi Tadokoro, who sent in some of his robots later in the year to climb over steep rubble piles and up steep stairs that Packbot could not negotiate, said⁠3 “[I]f they did not have Packbot, the cool shutdown of the plant would have [been] delayed considerably”. The two bigger brothers, both were the 710 model, weighing 157kg (346 pounds) with a lifting capacity of 100kg (220 pounds) where used to operate an industrial vacuum cleaner, move debris, and cut through fences so that other specialized robots could access particular work sites.

Japan has been consistently grateful for that help; we were glad that our technology could be helpful in such a dire situation.

In 2014, at the J-village, after a briefing on what was to come for us visitors, we were issued with dosimeters and we put on disposable outer garments to catch any radioactive particles. We then entered a highly instrumented minibus, sealed from unfiltered external air circulation, and headed north on the Rikuzenhama Highway.  The first few villages we saw were deserted but looked well kept up. That was because by that time the owners of the houses were allowed to come into the exclusion zone for a few hours each day to tend to their  properties. Further in to the zone everything started to look abandoned. After we passed the Fukushima Daini plant which we could see in the distance, we got off the highway and headed down into the town of Tomioka. The train station, quite close to the coast had been washed away, with just the platform remaining, and a single toilet sitting by itself still attached to the plumbing below. Most of the houses had damage to their first floors and from our minibus driving by we could see people’s belongings still inside. At one point we had to go around a car upside down on its roof in the middle of the road. Although it was three years after the event, Tomioka was frozen in time, just as it had been left by the tsunami. This was the first science fiction experience of the day. For all the world it looked like the set of a post-apocalyptic Hollywood movie. But this was a real post-apocalyptic location.

Back on the highway we continued north to the Fukushima Daiichi plant for science fiction experience number two. There are about six thousand people who work at the site cleaning up the damage to the power plant from the tsunami. Only a much smaller number are there on any given day, as continued exposure to the radiation levels is not allowed. We entered a number of buildings higher up the hill than the four reactors that were directly hit by the tsunami. All of them had makeshift piping for air, a look of temporary emergency setups, and all inside were wearing light protective garments as were we. Those who were outside had much more substantial protective clothing, including filtered breathing masks. Most times as we transitioned into and out of buildings we had to go through elaborate security gates where we entered machines that scanned us for radiation in case we had gotten some radioactive debris attached to us. Eventually we got to a control center, really just a few tables with lap tops on them, where the iRobot robots were still being operated from. We watched remotely as one was inside one of the reactor buildings measuring radiation levels–in some cases levels so high that a person could only spend just a few minutes per year in such an area.

Outside we drove around in our sealed bus. We saw where the undamaged fuel rods that had been inside the reactor buildings, but not inside the reactors, were being brought for temporary storage. That task was expected to be completed by the end of this decade. We saw almost (at that time) 1,000 storage tanks, each with about 1,000 tons of contaminated ground water that came down the hill during rainfall and then would be contaminated as it seeped through the ground around the reactor buildings. We saw where they were trying to freeze the ground down to many meters in depth to stop water flowing underground from the hill to the reactor buildings. We saw where along the ocean side of the reactor buildings workers had installed a steel wall from interlocking pylons driven into the seabed, holding back the ocean but more importantly stopping any ground water from leaking into the ocean. Everywhere were people in white protective suits with breathing equipment, working for short periods of time and then being cycled out so that their radiation exposure levels were not unsafe. Eventually we drove down to right near reactor number four, and saw the multi-hundred ton superstructure that had been installed over the building by remotely operated cranes so that the undamaged fuel rods could be lifted out of the damaged water pools where they were normally stored. We wanted to stay a little longer but the radiation level was creeping up, so soon it was decided that we should get out of there. And finally we received a briefing about the research plans on how to develop new robots that starting around the year 2020 would be able to begin the decades long clean up the three melted down reactors.

That really was a science fiction experience.

Robots

Robots were essential to the shutdown of Fukushima Daiichi, and will be for the next thirty or more years as the cleanup continues. The robots that iRobot sent were controlled by operators who looked at images sent back to decide where they should go, whether they should try to climb a pile of debris or not, and give the robots detailed instructions on how to handle unique door handles. In the sequence of three images below a pair of Packbot 510’s first open a door using a large rotary handle, push it open, and then proceed through.

[These are photographs of the operators’ console and in some cases you might just be able to make out the reflection of the operators in protective suits wearing breathing equipment.] Below we see a 510 model confronted by relatively light debris that it will be able to get over fairly safely.

In the image below a 710 model is set up to go and vacuum up radioactive material.

But the robots we sent to Fukushima were not just remote control machines. They had an Artificial Intelligence (AI) based operating system, known as Aware 2.0, that allowed the robots to build maps, plan optimal paths, right themselves should they tumble down a slope, and to retrace their path when they lost contact with their human operators. This does not sound much like sexy advanced AI, and indeed it is not so advanced compared to what clever videos from corporate research labs appear to show, or painstakingly crafted edge-of-just-possible demonstrations from academic research labs are able to do when things all work as planned. But simple and un-sexy is the nature of the sort of AI we can currently put on robots in real, messy, operational environments.

But wait! What about all those wonderful robots we have seen over the years, in the press, the ones that look like Albert Einstein, or the humanoids that have been brought into science museums around the United States for shows, or the ones we see brought out whenever a US President visits Japan4. You have seen them. Like the humanoid ones that walk on two legs, though with bended knees, which does look a little weird, turning to the audience and talking from behind a dark glass visor, sometimes seeming to interact with people, taking things from them, handing things to them, chatting, etc. What about them? They are all fake! Fake in the sense that though they are presented as autonomous they are not. They are operated by a team of usually six people, off stage. And everything on stage has been placed with precision, down to the millimeter. I have appeared on stage before those robots many times and been warned not to walk near or touch or any of the props, for example staircases, as that will make the robot fail, and when it does fail it is not aware that it has.

Corporate marketers had oversold a lot of robots, and confused many people about current robots’ true capabilities. Corporate marketing robots had no chance at all of helping in Fukushima.

Those robots are not real5.

Reality is hard.

Reality

Robotics, including self driving cars, is where Artificial Intelligence (AI) collides with the un-sanitized natural world. Up until now the natural world has been winning, and will probably continue to do so most of the time for quite some time.

We have come to expect our technology to be 100% reliable. We expect our car to start every morning and for the wheels to drive it forward when we push down on the gas pedal. We expect the plane that we board to both take off and land safely, even if, through experience, we tolerate it being late. We expect the internet to provide the web pages we go to on our smart phones. We expect our refrigerators and microwave ovens to work every day so that we can eat and survive.

AI has gotten a pass on providing 100% reliability as so many of its practical applications are mediated by a functioning cognitive human who naturally fills in the missing pieces for the AI system. Us humans do this all the time for small children and for the very elderly. We are wired to be accommodating to other intelligences that we think of as less than us.  Most of our AI technology is very much less than us, so we accommodate.

The demands of having robots interact with the un-sanitized natural world cancel that free pass. The natural world usually does not care that it is a robot rather than a person, and so the natural world is not accommodating. In my opinion there is a mismatch between what is popularly believed about AI and robotics, and what the reality is for the next few decades.

I have spent the last forty years as part of the Artificial Intelligence (and more general computer science) research groups at either Stanford or M.I.T. as a student, post-doc, faculty (both places), or more recently, emeritus professor. Through companies that I have co-founded, iRobot and Rethink Robotics (and yes, I was also once a co-founder and consultant for eight years to a silicon valley AI software company–it eventually failed, and then there was the robotics VC firm I co-founded, and then, etc., etc.), I have been involved in putting a lot of robots to work in five different domains–less than a handful of robots on other planets, tens of millions of robots vacuuming people’s floors, thousands of robots in the military for forward reconnaissance and for handling improvised explosive devices, robots in thousands of factories around the world working side by side with people, and many hundreds of robots in research labs all over the world, used for experiments in manipulation. I think it is fair to say that companies that I have cofounded have put more AI into more robots, and in more domains of application, than anyone else, ever.

All of these robots have had some level of AI, but none come remotely close to what the popular press seems to believe about robots and what many prognosticators warn about, sometimes as imminent dangers from AI and robots.

There seems to me to be a misconnect here.

Right now I believe we are in an AI bubble. And I believe that should it not burst, it will certainly deflate before too long. The existence of this bubble makes it hard for all sorts of people to know what to believe about the near term for AI and robotics. For some people the questions surrounding AI and robotics are simply an intellectual curiosity. For some it is a very real question about what their job prospects might look like in just a few years. For executives at companies and for those in leadership positions in governments and the military, it is a fraught time understanding the true promise and dangers of AI. Most of what we read in the headlines, and from misguided well meaning academics, including from physicists and cosmologists, about AI is, I believe, completely off the mark.

In debates about how quickly we will realize in practice the AI and robotics of Hollywood I like to see myself as the voice of reason. I fear that I am often seen as the old fuddy-duddy who does not quite get how powerful AI is, and how quickly we are marching towards super intelligence, whatever that may be. I am even skeptical about how soon we will see self driving6 cars on our roads, and have been repeatedly told that I “just don’t understand”.

Concomitantly I am critical of what I see as scare mongering about how powerful AI will soon be, especially when it is claimed that we as humans must start taking precautions against it now. Given the sorts of things I saw in and around Fukushima Daiichi, I understand a general fear of technology that is not easily understood by non-experts, but I do want us humans to be realistic about what is scary, and what is only imagined to be scary.

I am even more critical about some of the arguments about ethical decisions that will robots will face in just the next few years. I do believe that there are plenty of ethical decisions facing us humans as we deploy robots in terms of the algorithms they should be running, but it is far, far, from having robots making ethical decisions on the fly. There will be neither benevolent AI nor malevolent AI in the next few decades where AI systems have any internal understanding of those terms. This is a research dream, and I do not criticize people thinking about it as research. I do criticize them thinking that this research is going to turn into reality any time soon, and talking about “regulations” on what sort of AI we can build or even research as a remedy for imagined pitfalls. Instead we will need to worry about people using technology, including AI, for malevolent purposes, and we should encourage the use of technology, including AI, for benevolent purposes. AI is nowhere near ready enough to make any fundamental difference in this regard.

Why am I seen as an outlier in my opinions of where AI is going?  In my dealings with individual researchers in academia and industrial research laboratories, and in my discussions with C-level executives at some of the best known companies that use AI, I find much common ground, and general agreement with my positions. I am encouraged by that. I want to share with readers of this blog what the basis is for my estimations of where we are in deploying AI and robotics, and why it is still hard; I will expand on these arguments in the next few essays that I  post after this one.

Perhaps after reading my forthcoming essays you will conclude that I am an old fuddy-duddy. Or, perhaps, I am a realist. I’ll be posting a lot of long form essays about these topics over the next few months. People who read them will get to decide for themselves where I fall in the firmament of AI.

Now, at the same time, we have only been working on Artificial Intelligence and robotics for just a few decades. Already AI has started to have real impact on our lives. There will be much more that comes from AI and robotics. For those who are able to see through what is hype and what is real there are going to be great opportunities.

There are great opportunities for researchers who concentrate on the critical problems that remain, and are able to prioritize what research will have the greatest impact.

For those who want to start companies in AI and robotics, understanding what is practical and matches what the market will eagerly accept, there is again great opportunity. We should expect to see many large and successful companies rise up in this space over the next couple of decades.

For those are willing to dare greatly, and who want to make scientific contributions for the ages there is so much at such a deep level that we do not yet understand that there is plenty of room for a few more Ada Lovelaces, Alan Turings, Albert Einsteins, and Marie Curies to make their marks.

Hubris and humility

For four years starting in 1988 I co-taught the M.I.T. introductory Artificial Intelligence (AI) class, numbered then and still now 6.034, with Professor Patrick Henry Winston. He still teaches that class which has only gotten better, and it is available online⁠7 for all to experience.

Back then Patrick used to start the first class with a telling anecdote. Growing up in Peoria, Illinois, Patrick had at one time had a pet raccoon. Like others of its species Patrick’s raccoon was very dexterous, and so hard to keep in a cage as it was usually clever enough to find a way to open the door unless it was really locked tight. Patrick would regale the class with how intelligent his raccoon had been. Then he would dead pan “but I never expected it to be smart enough to build a copy of itself”.

This was caution for humility. Then, as now, there was incredible promise, and I would say hype around Artificial Intelligence, and so this was Patrick’s cautionary note. We might think we were just around the corner from building machines that were just as smart, by whatever measure, as people, but perhaps we were really no more than dexterous raccoons with computers.

At that time AI was not a term that ever appeared in the popular press, IBM went out of its way to say that computers could not think, only people could think, and AI was not thought of appropriate stature to be part of many computer science departments. Patrick’s remarks led me to wonder out loud whether we were overwhelming ourselves with our own hubris. I liked to extend Patrick’s thinking and wondered about super-intelligent aliens (biological or otherwise) observing us from high orbit or further afield. I imagined them looking down at us, like we might look at zoo animals, and being amused by our cleverness, but being clear about our limitations. “Look at those ones in their little AI Lab at M.I.T.! They think they are going to be able to build things as smart as themselves, but they have no idea of the complexities involved and their little brains, not even with help from their computers (oh, they have so got those wrong!), are just never going to get there. Should we tell them, or would that be unkind to dash their little hopes? Those humans are never going to develop nor understand how intelligence works.”

Still, today, Patrick Winston’s admonition is is a timely caution for us humans. A little humility about the the possible limits of our capabilities is in order. Humility about the future of AI and also Machine Learning (ML) is in desperate short supply. Hubris, from some AI researchers, from venture capitalists (VCs), and from some captains of technology industries is dripping thick and fast. Often the press manages to amplify the hubris as that is what makes a good attention grabbing story.



1 “Fast, Cheap and Out of Control: A Robot Invasion of the Solar System”, Rodney A. Brooks and Anita M. Flynn, Journal of the British Interplanetary Society, 42(10)478–485, October 1989.

2 Flesh and Machines, Rodney A. Brooks, Pantheon, New York, 2002.

3 “The Day After Fukushima”, Danielle DeLatte, Space Safety Magazine, (7)7–9, Spring 2013.

4 During the 2014 Abe/Obama summit, as usual, a Japanese humanoid robot built by a Japanese auto maker was brought out to interact with the US President. The robot kicked a soccer ball towards President Obama and he kicked it back, to great applause. Then President Obama turned to his hosts and asked, not so innocently, so is that robot autonomous or tele-operated from behind the scenes? That is an educated and intellectually curious President.

5 Partially in response to the Fukushima disaster the US Defense Advanced Research Projects Agency (DARPA) set up a challenge competition for robots to operate in disaster areas. Japanese teams were entered in this competition; the first time there had been significant interaction between Japanese roboticists and DARPA–they were a very strong and welcome addition. The competition ran from late 2011 to June 5th and 6th of 2015 when the final competition was held. The robots were semi-autonomous with communications from human operators over a deliberately unreliable and degraded communications link. This short video focusses on the second place team but also shows some of the other teams, and gives a good overview of the state of the art in 2015. For a selection of greatest failures at the competition see this link. I was there and watched all this unfold in real time–it was something akin to watching paint dry as there were regularly 10 to 20 minute intervals when absolutely nothing happened and a robot just stood there frozen. This is the reality of what our robots can currently do in unstructured environments, even with a team of researchers communicating with them when the can.

6 I have written two previous blog posts on self driving cars. In the first I talked about the ways such cars will need to interact with pedestrians in urban areas, and many of the problems that will need to be solved.  In the second I talked about all the uncommon cases that we as drivers in urban situations need to face, such as blocked roads, temporarily banned parking, interacting with police, etc. In a third post I plan to talk about how self driving cars will change the natures of our cities. I do not view self driving cars as doomed by any of these problems, in fact I am sure they will become the default way for cars to operate in the lifetimes of many people who are alive today. I do, however, think that the optimistic forecasts that we have seen from academics, pundits, and companies are wildly off the mark. In fact that reality is starting to set in. Earlier this month the brand new CEO of Ford said that the previous goal of commercial self driving cars by 2021 was not going to happen. No new date was announced.

You can find the 24 lectures of 6.034 online. I particularly recommend lectures 12a and 12b on neural networks and deep neural networks to all those who want to understand the basics of how deep learning works–the only prerequisite is a little multi-variable differential calculus. Earlier in 2017 I posted a just slightly longer introduction than this paragraph.

[FoR&AI] Future of Robotics and Artificial Intelligence

rodneybrooks.com/forai-future-of-robotics-and-artificial-intelligence/

I plan on publishing  a set of essays on the future of robotics and Artificial Intelligence in the late summer and fall of 2017, perhaps extending in to 2018. I’ll list them all here as they come out. They are designed to be read as stand alone essays, and in any order, but I’ll order them here in my guess at the optimal order in which to read them.

The origins of “Artificial Intelligence” published on April 26, 2018.

Domo Arigato Mr. Roboto published on August 28, 2017.

The Seven Deadly Sins of Predicting the Future of AI published on September 7, 2017.

Machine Learning Explained published on August 28, 2017.

Steps Toward Super Intelligence I, How We Got Here, published July 15, 2018.

Steps Toward Super Intelligence II, Beyond the Turing Test, published July 15, 2018.

Steps Toward Super Intelligence III, Hard Things Today, published July 15, 2018.

Steps Toward Super Intelligence IV, Things to Work on Now, published July 15, 2018.

Experiments In Automobile UI/UX

rodneybrooks.com/experiments-in-automobile-uiux/

Our automobiles are getting three makeovers simultaneously, with promises of a fourth. There is more action in reinventing of automobiles all at once than there has been since the first ones crawled into existence.

First, our cars are turning electric, and the UK recently said that no new gas or diesel automobiles will be allowed on the road starting in 2040. Don’t be surprised to see more countries and states (e.g., California) follow suit.

Second, our cars are getting more driver assist features with lane change and backup audible warnings, automatic parking and lane changing, new forms of smart cruise control, new bumper-to-bumper traffic control options, etc. These are level 1 and level 2 autonomy (see my blog post from earlier in the year), with level 3 starting to show up just a little, and overly enthusiastic predictions of  levels 4 and 5 (again, you can see some of my thoughts on how I think that is further off than expected).

And third, we are getting new user interfaces in our cars, and that is the subject of this short post.

I drive a lot of different rental cars during any given year, so even though my own car is now eleven years old I get exposed to a lot of the new interfaces that are appearing in very standard level compact cars.

I’ve been driving the same rental car for two weeks as of today, and there are some things I really like about it. Most cars I have rented over the last two years have had a backup camera that has a live image on the LCD screen in the center of the dashboard. This is great, as it is usually a much better view than can be had by scanning all three rear view mirrors. The better versions of this feature show you an overlay of exactly where the car will go as you backup with the current setting of the steering wheel.  Here is a view from the screen in my car this morning:

This is so much better than a plain old rear view mirror. It shows me exactly what I might hit as I back up, and for me that is especially useful in a parking garage as I am just so great at hitting things while going backward…

But it is not all champagne and roses. After having this car for two weeks, twice this morning, while I was driving along the street, the following window popped up on that same screen, which is a touch screen, by the way:

How can this be a good idea, or a good User Interface feature? It could lead to a very bad User Experience! A window pops up to tell me I shouldn’t take my eyes of the road to deal with the interface. That is a good message. But not while I’m driving! Both times it made me take my eyes off the road to read what this warning was about, and then I needed to reach out and servo my finger to the “OK” virtual button to dismiss it. It was a real temptation to do it while driving. Exactly the thing it is warning against!

Bad UI. Potentially disastrous UX.

This reminded me of an interchange I saw on Facebook right after the newest Tesla came out. Someone complained about the lack of dials and knobs, and so much of the UI being put on the very big LCD that is in the middle of a Tesla dashboard. Another person chided that person, saying essentially, “get over it, we now live in the world of the iPhone, not the Blackberry”.

I thought that latter comment completely missed a real issue. The knobs and levers with their fixed positions and fixed meanings within any particular car, along with the tactile feedback that they give us allows us to do a lot of control operations without taking our eyes off the road at all. That is a very good thing until the task of driving is completely taken over by the car itself. We need our attention out on the road. Moving control functions, that are needed while in motion and while controlling the car, to a touch screen, is probably not a good idea. Being able to do things without using our eyes is a safety feature while driving (and while walking with an iPhone…).

[By the way, why do our turn indicator blinkers make a clicking sound? Because the original ones that were introduced in the 1950’s operated by running a current through and heating up a bi-metallic strip. As it heated up it bent until it hit a contact, hence the click, which then drained the current to the indicator lights on the left or right side of the car, allowing the bi-metatallic strip to cool down and repeat. Now cars simulate that same old clicking sound so that we know when the indicators are blinking.]

I suspect that we are all going to be guinea pigs over the next few years with auto-makers bringing out some really great new UI features, along with some real failures.

Be careful!!

 

 

Edge Cases For Self Driving Cars

rodneybrooks.com/edge-cases-for-self-driving-cars/

Perhaps through this essay I will get the bee out of my bonnet^{\big 1} that fully driverless cars are a lot further off than many techies, much of the press, and even many auto executives seem to think. They will get here and human driving will probably disappear in the lifetimes of many people reading this, but it is not going to all happen in the blink of an eye as many expect. There are lots of details to be worked out.

In my very first post on this blog I talked about the unexpected consequences of having self driving cars. In this post I want to talk about about a number of edge cases, which I think will cause it to be a very long time before we have level 4 or level 5 self driving cars wandering our streets, especially without a human in them, and even then there are going to be lots of problems.

First though, we need to re-familiarize ourselves with the generally accepted levels of autonomy that every one is excited about for our cars.

Here are the levels from the autonomous car entry in Wikipedia which attributes this particular set to the SAE (Society of Automotive Engineers):

  • Level 0: Automated system has no vehicle control, but may issue warnings.
  • Level 1: Driver must be ready to take control at any time. Automated system may include features such as Adaptive Cruise Control (ACC), Parking Assistance with automated steering, and Lane Keeping Assistance (LKA) Type II in any combination.
  • Level 2: The driver is obliged to detect objects and events and respond if the automated system fails to respond properly. The automated system executes accelerating, braking, and steering. The automated system can deactivate immediately upon takeover by the driver.
  • Level 3: Within known, limited environments (such as freeways), the driver can safely turn their attention away from driving tasks, but must still be prepared to take control when needed.
  • Level 4: The automated system can control the vehicle in all but a few environments such as severe weather. The driver must enable the automated system only when it is safe to do so. When enabled, driver attention is not required.
  • Level 5: Other than setting the destination and starting the system, no human intervention is required. The automatic system can drive to any location where it is legal to drive and make its own decision.

There are many issues with level 2 and level 3 autonomy, which might make them further off in the future than people are predicting, or perhaps even  forever impractical due to limitations on how quickly humans can go from not paying attention to taking control in difficult situations. Indeed as outlined in this Wired story many companies have decided to skip level 3 and concentrate on levels 4 and 5. The iconic Waymo (formerly Google) car has no steering wheel or other conventional automobile controls–it is born to be a level 4 or level 5 car. [This image is from Wikipedia.]

So here I am going to talk only about level 4 and level 5 autonomy, and not really make a distinction between them.  When I refer to an “autonomous car” I’ll be talking about ones with level 4 or level 5 autonomy.

I will make distinctions between cars with conventional controls so that they are capable of being driven by a human in the normal way, and cars like the Waymo one pictured above with no such controls, and I will refer to that as an unconventional car. I’ll use those two adjectives, conventional, and unconventional, for cars, and then distinguish what is necessary to make them practical in some edge case circumstances.

I will also refer to gasoline powered driverless cars versus all electric driverless cars, i.e., gasoline vs. electric.

Ride-sharing companies like Uber are putting a lot of resources into autonomous cars. This makes sense given their business model as they want to eliminate the need for drivers at all, thus saving their major remaining labor cost. They envision empty cars being summoned by a customer, driving to wherever that customer wants to be picked up, with absolutely no one in the car. Without that, having the autonomy technology doesn’t make sense to this growing segment of the transportation industry. I’ll refer to such an automobile, with no-one in it as a Carempty. In contrast, an autonomous car which has a conscious person in it, whether it is an unconventional car and they can’t actually drive it in the normal way, or whether it is a conventional car but they are not at all involved in the driving, perhaps sitting in the back seat, as Careless, as presumably that person shouldn’t have to care less about the driving other than indicating where they want to go.

So we have both an unconventional and a conventional Carempty and Careless, and perhaps they are gasoline or electric.

Many of the edge cases I will talk about here are based on the neighborhood in which I live, Cambridgeport in Cambridge, Massachusetts. It is a neighborhood of narrow one way streets, packed with parked cars on both sides of the road so that it is impossible to pass by if a car or truck stopped in the road. A few larger streets are two way, and some of them have two lanes, one in each direction, but at least one nearby two way street only has one lane–one car needs to pull over, somehow, if two cars are traveling on the opposite direction (the southern end of Hamilton Street in the block where the “The Good News Garage” of the well known NPR radio brothers “Click and Clack” is located).

HOW MUCH DRIVING CAN A NON-DRIVER DO?

In a conventional Careless a licensed human can take over the driving when necessary, unless say it is a ride sharing car, and in that case humans might be locked out of using the controls directly. For an unconventional Careless, like one of the Waymo cars pictured above, the human can not take over directly either. So a passenger in a conventional ride-sharing car, or in an unconventional car are in the same boat. But how much driving can that human do?

In both cases the human passenger needs to be able to specify the destination. For a ride-sharing service that may have been done on a smart phone app when calling for the service. But once in the car the person may want to change their mind, or demand that the car take a particular route–I certainly often do that with less experienced drivers who are clearly going a horrible way, often at the suggestion of their automated route planners. Should all this interaction be via an app? I am guessing, given the rapid improvements in voice systems, such as we see in the Amazon Echo, or the Google Home, we will all expect to be able to converse by voice with any autonomous car that we find ourselves in.

We’ll ignore for the moment a whole bunch of teenagers each yelling instructions and pranking the car. Let’s just think about a lone sensible mature person in the car trying to get somewhere.

Will all they be able to do is give the destination and some optional route advice, or will they be able to give more detailed instructions when the car is clearly screwing up, or missing some perceptual clue that the occupant can clearly recognize? The next few sections give lots of examples from my neighborhood that are going to be quite challenging for autonomous cars for many years to come, and so such advice will come in handy.

In some cases the human might be called upon to, or just wish to, give quite detailed advice to the car. What if they don’t have a driver’s license? Will the be guilty of illegally driving a car in that case? How much advice should they be allowed to give (spoiler alert, the car might need a lot in some circumstances)? And when should the car take the advice of the human? Does it need to know if the person in the car talking to it has a driver’s license?

Read on.

WHAT TO DO ABOUT A BLOCKED ROAD

In my local one-way streets the only thing to do if a car or other vehicle is stopped in the travel lane is to wait for it to move on. There is no way to get past it while it stays where it is.

The question is whether to toot the horn or not at a stopped vehicle.

Why would it be stopped? It could be a Lyft or an Uber waiting for a person to come out of their house or condominium. A little soft toot will often get cooperation and they will try to find a place a bit further up the street to pull over.  A loud toot, however, might cause some ire and they will just sit there. And if it is a regular taxi service then no amount of gentleness or harshness will do any good at all. “Screw you” is the default position.

Sometimes a car is stopped because the driver is busy texting, most usually when they are at an intersection, had to wait for some one to the cross in front of them, their attention wandered, they started reading a text, and now they are texting and have forgotten that they are in charge of an automobile. From behind one can often tell what they are up to by noticing their head inclination, even from inside the car behind. A very gentle toot will usually get them to move; they will be slightly embarrassed at their own illegal (in Massachusetts) behavior.

And sometimes it is a car stopped outside an eldercare residence building with someone helping a very frail person into or out of the car.  Any sort of toot from a stopped car behind is really quite rude in these circumstances, distressing for the elderly person being helped, and rightfully ire raising for the person taking care of that older person.

Another common case of road blockage is that a garbage truck stopped to pick up garbage. There are actual two varieties, one for trash, and one for recyclables. It is best to stop back a bit further from these trucks than from other things blocking the road, as people will be running around to the back of the truck and hoisting heavy bins into it. And there is no way to get these trucks to move faster than they already are. Unlike other trucks, they will continue to stop every few yards. So the best strategy is to follow, stop and go, until the first side street and take that, even if, as it is most likely a one-way street, it sends you off in a really inconvenient direction.

Yet a third case is a delivery truck. It might be a US Postal Service truck, or a UPS or Fedex truck, or sometimes even an Amazon branded truck. Again tooting these trucks makes absolutely no difference–often the driver is getting a signature at a house, or may be in the lobby of a large condominium complex. It is easy for a human driver to figure out that it is one of these sorts of trucks. And then the human knows that it is not so likely to stop again really soon, so staying behind this truck once it moves rather than taking the first side street is probably the right decision.

If on the other hand it is a truck from a plumbing service, say, it is worth blasting it with your horn. These guys can be shamed into moving on and finding some sort of legal parking space. If you just sit there however it could be many minutes before they will move.

A Careless automobile could ask its human occupant whether it should toot. But should it make a value judgement if the human is spontaneously demanding that it toot its horn loudly?

A Carempty automobile could just never toot, though the driver in a car behind it might start tooting it, loudly. Not tooting is going to slow down Carempties quite a bit, and texting drivers just might not care at all if they realize it is a Carempty that is showing even a little impatience. And should an autonomous car be listening for toots from a car behind it, and change its behavior based on what it hears? We expect humans to do so. But are the near future autonomous cars going to be so perfect already that they should take no external advice?

Now if Carempties get toot happy, at least in my neighborhood that will annoy the residents having tooting cars outside their houses at a much higher level than at the moment, and they might start to annoy the human drivers in the neighborhood.

The point here is that there is a whole lot of perceptual situations that a an autonomous vehicle will need to recognize if it is to be anything more than a clumsy moronic driver (an evaluation us locals often make of each other in my neighborhood…). As a class, autonomous vehicles will not want to get such a reputation, as the humans will soon discriminate against them in ways subtle and not so subtle. 

Maps DOn’T Tell the Whole Story

Recently I pulled out of the my garage and turned right onto the one way street that runs past my condominium building, and headed to the end of my single block street, expecting to turn right at a “T” junction onto another one way street. But when I got there, just to the right of the intersection the street was blocked by street construction, cordoned off, and with a small orange sign a foot or so off the ground saying “No Entry”.

The only truly legal choice for me to make was to stop. To go back from where I had come I needed to travel the wrong way on my street, facing either backwards or forwards, and either stopping at my garage, or continuing all the way to the street at the start of my street. Or I could turn left and go the wrong way on the street I had wanted to turn right onto, and after a block turn off onto a side street going in a legal direction.

A Careless might inform its human occupant of the quandry and ask for advice on what to do. That person might be able to do any of the social interactions needed should the Careless meet another car coming in the legal direction under either of these options.

But Carempty will need some extra smarts for this case.  Either hordes of empty cars eventually pile up at this intersection or each one will need to decide to break the law and go the wrong way down one of the two one way streets–that is what I had to do that morning.

The maps that a Carempty has won’t help it a whole lot in this case, beyond letting it know the minimum distance it is going to have to be in a transgressive state.

Hmmm.  It is OK for a Carempty to break the law when it decides it has to? Is it OK for a Careless to break the law when its human occupant tells it to? In the situation I found myself in above, I would certainly have expected my Careless to obey me and go the wrong way down a one way street. But perhaps the Careless shouldn’t do that if it knows that it is transporting a dementia patient.

The Police

How are the police supposed to interact with a Carempty?

While we have both driverful and driverless cars on our roads I think the police are going to assume that as with driverful cars they can interact with them by waving them through an intersection perhaps through a red light, stopping them with a hand signal at a green light, or just to allow someone to cross the road.

But besides being able to understand what an external human hand signaling them is trying to convey, autonomous cars probably should try to certify in some sense whether the person that is giving them those signals is supposed to be doing so with authority, with politeness, or with malice. Certainly police should be obeyed, and police should expect that they will be. So the car needs to recognize when someone is a police officer, no matter what additional weather gear they might be wearing. Likewise they should recognize and obey school crossing monitors. And road construction workers. And pedestrians giving them a break and letting them pass ahead of them. But should they obey all humans at all times? And what if in a Careless situation their human occupant tells them to ignore the taunting teenager?

Sometimes a police officer might direct a car to do something otherwise considered illegal, like drive up on to a sidewalk to get around some road obstacle. In that case a Carempty probably should do it. But if it is just the delivery driver whose truck is blocking the road wanting to get the Carempty to stop tooting at them, then probably the car should not obey, as then it could be in trouble with the actual police. That is a lot of situational awareness for a car to have to have.

Things get more complicated when it is the police and the car is doing something wrong, or there is an extraordinary circumstance which the car has no way of understanding.

In the previous section we just established that autonomous cars will sometimes need to break the law. So police might need to interact with law breaking autonomous cars.

One view of the possible conundrum is this cartoon from the New Yorker. There are two instantly recognizable Waymo style self driving cars, with no steering wheels or other controls, one a police car that has just pulled over the other car. They both had people in them, and the cop is asking the guy in the car that has just been pulled over, “Does your car have any idea why my car pulled it over?”.

If an autonomous car fails to see a temporary local speed sign and gets caught in a speed trap, how is it to be pulled over? Does it need to understand flashing blue lights and a siren, and does it do the pull to the side in a way that we have all done, only to be relieved when we realize that we were not the actual target?

And getting back to when I had to decide to go the wrong way down a one way street, what if a whole bunch of Carempties have accumulated at that intersection and a police officer is dispatched to clear them out? For driverful cars a police officee might give a series of instructions and point out in just a few seconds who goes first, who goes second, third, etc. That is a subtle elongated set of gestures that I am pretty sure no deep learning network has any hope at the moment of intpreting, of fully understanding the range of possibilities that a police officer might choose to use.

Or will it be the case that the police need to learn a whole new gesture language to deal with driverless cars? And will all makes all understand the same language?

Or will we first need to develop a communication system that all police officers will have access to and which all autonomous cars will understand so that police can interact with autonomous cars? Who will pay for the training? How long will that take, and what sort of legislation (in how many jurisdictions) will be required?

Getting Towed

A lot of cars get towed in Cambridge. Most streets get cleaned on a regular schedule (different sides of the same street on different days), and if your car is parked there at 7am you will get towed–see the sign in the left image. And during snow emergencies, or without the right sticker/permit you might get towed at any time. And then there are pop-up no parking signs, partially hand written, that are issued by the city on request for places for moving vans, etc. Will our autonomous cars be able to read these? Will they be fooled by fake signs that residents put up to keep pesky autonomous cars from taking up a parking spot right outside their house?

If an unconventional Carempty is parked on the street, one assumes that it might at any time start up upon being summoned by its owner, or if it is a ride-share car when its services are needed. So now imagine that you are the tow truck operator and you are supposed to be towing such a car. Can you be sure it won’t try driving away as you are crawling under it connect the chains, etc., to tow it?  If a human runs out to move their car at the last minute you can see when things are going to start and adjust. How will it work with fully autonomous cars?

And what about a Carempty that has a serious breakdown, perhaps in its driving system, and it just sits there and can no longer safely move itself. That will need to be towed most likely. Can the tow truck operator have some way to guarantee that it is shut down and will not jump back to life, especially when the owner has not been contactable, to put it in safe mode remotely? What will be the protocols and regulations around this?

And then if the car is towed, and I know this from experience, it is going to be in a muddy lot full of enormous potholes in some nearby town, with no marked parking areas or driving lanes. The cars will have been dumped at all angles, higgledy-piggledy. And the lot is certainly not going have its instantaneous layout mapped by one of the mapping companies, providing the maps that autonomous cars rely on for navigation. To retrieve such a car a human is likely going to have to go do it (and pay before getting it out), but if it is an unconventional car it is certainly going to require some one in it to talk it through getting out of there without angering the lot owner (and again from experience, that is a really easy thing to do–anger the lot owner). Yes, in some distant future tow lots in Massachusetts will be clean, and flat with no potholes deeper than six inches, and with electronic payment systems, and all will be wonderful for our autonomous cars to find their way out.

Don’t hold your breath.

OTHER TRICKY SITUATIONS

What happens when a Carempty is involved in an accident? We know that many car companies are hoping that their cars will never be involved in an accident, but humans are dumb enough that as long as there are both human drivers and autonomous cars on the same streets, sometimes a human is going to drive right into an autonomous car.

Autonomous cars will need to recognize such a situation and go through some protocol. There is a ritual when a fender bender happens between two driverful cars. Both drivers stop and get out of their cars, perhaps blocking traffic (see above) and go through a process of exchanging insurance information. If one of the cars is an autonomous vehicle the the human driver can take a photo on their phone (technology to the rescue!) of the autonomous car’s license plate. But how is a Carempty supposed to find out who hit it? In the distant future when all the automobile stock on the road have transponders (like current airplanes) that will be relatively easy (though we will need to work through horrendous privacy issues to get there), but for the foreseeable future this is going to be something of a problem.

And what about refueling? If a ride-sharing car is gasoline powered and out giving rides all day, how does it get refueled? Does it need to go back to its home base to have a human from its company put in more gasoline? Or will we expect to have auto refueling stations around our cities? The same problem will be there even if we quickly pass beyond gasoline powered cars. Electric Carempties will still need to recharge–will we need to replace all the electric car recharging stations that are starting to pop up with ones that require no human intervention?

Autonomous cars are likely to require lots of infrastructure changes that we are just not quite ready for yet.

Impacts on the Future of Autonomous Cars

I have exposed a whole bunch of quandaries here for both Carempties and Carelesses. None rise to the moral level of the so called trolley problem (do I kill the one nun or seven robbers?) but unlike the trolley problem variants of these edge cases are very likely to arise, at least in my neighborhood. There will be many other edge case conundrums in the thousands, perhaps millions, of unique neighborhoods around the world.

One could try to have some general purpose principles that cars could reason from in any circumstances, perhaps like Asimov’s Three Laws^{\big 2}, and perhaps tune the principles to the prevailing local wisdom on what is appropriate or not. In any case there will need to be a lot of codifying of what is required of autonomous cars in the form of new traffic laws and regulations. It will take a lot of trial and error and time to get these laws right.

Even with an appropriate set of guiding principles there are going to be a lot of perceptual challenges for both Carempties and Carelesses that are way beyond those that current developers have solved with deep learning networks, and perhaps a lot more automated reasoning that any AI systems have so far been expected to demonstrate.

I suspect that to get this right we will end up wanting  our cars to be as intelligent as a human, in order to handle all the edge cases appropriately.

And then they might not like the wage levels that ride-sharing companies will be willing to pay them.



^{\big 1}But maybe not.  I may have one more essay on how driverless cars are going to cause major infrastructure changes in our cities, just as the original driverful cars did. These changes will be brought on by the need for geofencing–something that I think proponents are underestimating in importance.

^{\big 2}Recall that Isaac Asimov used these laws as a plot device for his science fiction stories, by laying out situations where these seemingly simple and straightforward laws led to logical fallacies that the story proponents, be they robot or human, had to find a way through.

Is War Now Post Kinetic?

rodneybrooks.com/is-war-now-post-kinetic/

When the world around us changes, often due to technology, we need to change how we interact with it, or we will not do well.

Kodak was well aware of the digital photography tsunami it faced but was not able to transform itself from a film photography company until too late, and is no more. On the other hand, Pitney Bowes started its transformation early from a provider of mail stamping machines to an eCommerce solutions company and remains in the S&P 500.

Governments and politicians are not immune from the challenges that technological change produces on the ground, and former policies and vote getting proclamations may lag current realities^{\big 1}.

I do wonder if war is transforming itself around us to being fought in a non-kinetic way, and which nations are aware of that, and how that will change the world going forward. And, importantly for the United States, what does that say about what its Federal budget priorities should be?

A Brief History of Kinetic War

The technology of war has always been about delivering more kinetic energy, faster, more accurately and with more remote standoff from the recipient of the energy, first to human bodies, and then to infrastructure and supply chains.

New technologies caused changes in tactics and strategies, and many of them eventually made old technologies obsolete, but often a new technology would co-exist with one that it would eventually supplant for long periods, even centuries.

One imagines that the earliest weapons used in conflicts between groups of people were clubs and axes of various sorts. These early wars were fought in close proximity, delivering kinetic blows directly to another’s body.

By about 4,400 years ago the first copper daggers appeared, and by 3,600 years ago, bronze swords appeared, allowing for an attack at a slightly longer distance, perhaps out of direct reach of the victim. Even today our infantries are equipped with bayonets on the ends of guns to deliver direct kinetic violence to another’s body through the use of human muscles. With daggers and swords the kinetic blows could be much more deadly as they needed less human energy to cause bleeding.

Simultaneously the first “stand off” weapons were developed; bows and arrows 12,000 years ago, most likely with a very limited range. The Egyptians had bows with a range of 100 meters a little less than 4,000 years ago. A bow stores the energy from human muscle in a single drawing motion, and then delivers it all in a fraction of a second. These weapons did not eliminate hand to hand combat, but they did allow engagement from a distance. With the introduction of horses and later chariots, there was added the element of speed of closing from too far away to engage to being in engagement range very quickly. These developments were all aimed at getting bleed-producing kinetic impacts on humans from a distance.

A little less than 3,000 years ago war saw a new way to use kinetic energy; thermally. No longer was it just the energy of human muscles that rained down on the enemy, but that from fire. First from burning crops, but soon by delivering  burning objects via catapults and other throwing devices. Those throwing devices started out just delivering heavy weights, though the muscle energy of many people stored over many minutes of effort. But once burning objects were being thrown they could deliver the thermal energy stored in the projectile, as well as unleash more thermal energy by setting things on fire in the landing area.

During the 8th to 16th century, hurled anti-personnel weapons, those aimed at individual people, were developed where projectiles full of hot pitch, oil, or resin, were thrown by mechanical devices, again with stored human energy, intended to maim and disable an individual human that they might hit.

The arrival of chemical explosives ultimately changed most things about warfare, but there was a surprisingly long coexistence with older weapons. The earliest form of gunpowder was developed in 9th century China, and it reached Europe courtesy of the Mongols in 1241. The cannon, which provided a way of harnessing that explosive power to deliver high amounts of kinetic energy in the form of metal or stone balls provided both more distant standoff and more destructive kinetics, and was well developed by the 14th century, with the first man portable versions coming of age in the 15th century.

But meanwhile the bow and arrow made a come back, with the English longbow, traditionally made from yew (and prompting a European wide trade network in that wood), having a range of 300 meters in the 14th and 15th centuries. It was contemporary with the cannon, but the agility of it being carried by a single bowman led to it being the major reason for victory in a large scale battle as late as the Battle of Agincourt in 1415.

The cannon changed the nature of naval warfare, and naval warfare itself was about logistics and supply lines, and later being a mobile platform to pound installations on the coast from the safety of the sea. Ships also changed over time due to new technologies for their propulsion, from oars, to sails, to steam, and ultimately to nuclear power, making them faster and more reliable. Meanwhile the mobile cannon was developed into more useful sorts of weapons, and with the invention of bullets (which combined the powder and projectile into a compact pre-manufactured expendable device), guns and then machine guns became the preferred weapon of the ground soldier.

Each of these technological developments improved upon the delivery of kinetic energy to the enemy, over time, in fits and starts making that delivery faster, more accurate, more energetic, and with more distant standoff.

Rarely were the new technologies adopted quickly and universally, but over time they often made older technologies completely obsolete. One wonders how quickly people noticed the new technologies, how they were going to change war completely, and how they responded to those changes.

Latter Day WAR

In the last one hundred or so years, from the beginning of the Great War, also known as World War I, we have seen continued technological change in how kinetic energy is delivered during conflict. In the Great War we saw both the introduction of airplanes, originally as intelligence gathering machine conveyances, but later as deliverers of bullets and bombs, and the introduction of tanks. Even with mechanization, the United Army still had twelve horse regiments, each of 790 horses, at the beginning of World War II. They were no match for tanks, and hard to integrate with tank units, so eventually they were abolished.

By the end of World War II we had seen both the deployment of missiles (the V1 and V2 by Germany), and nuclear weapons (by the United States). Later married together, nuclear tipped missiles became the defining, but unused, technology that redefined the nature of war between superpowers. Largely that notion is obsolete, but North Korea, a small poor country, is actively flirting with it again these very days.

Another innovation in World War II, practiced by both sides, was massive direct kinetic hits on the civilian populations of the enemy, delivered through the air. For the first time kinetic energy could be delivered far inside territory still held by the enemy, and damage to infrastructure and morale could be wrought without the need to invade on the ground. Kinetically destroying large numbers of civilians was also part of the logic of MAD, or Mutually Assured Destruction, of the United States and the USSR pointing massive numbers of nuclear tipped missiles at each other during the cold war.

Essentially now war is either local engagements between smaller countries, or asymmetric battles between large powers and smaller countries or non-state actors. The dominant approach for the United States is to launch massive ship and air based volleys of Tomahawk Cruise Missiles, with conventional kinetic war heads, to degrade the war fighting infrastructure in the target territory, and then boots on the ground. The other side deploys harassing explosives both as booby traps, and to target both the enemy and local civilians through using human suicide bombers as a stand off mechanism for those directing the fight. As part of this asymmetry the non-state actors continually look for new ways to deliver kinetic explosions on board civilian aircraft which has had the effect of making air travel worldwide more and more unpleasant for the last 16 years.

In slow motion each class of combatant changes their behavior to respond to new, and past, technologies deployed or threatened by the other side.

But over the whole history of war, rulers and governments have had to face the issue of what war to prepare for and where to place their resources. When should a country stop concentrating on sources of yew and instead invest more heavily in portable cannons? When should a country give up on supporting regiments of horses? When should a country turn away from the ruinous expense of yet higher performance fighter planes whose performance is only needed to engage other fighter planes and instead invest more heavily in cruise missiles and drones with targeted kinetic capabilities?

How should a country balance its portfolio of spending on the old technologies of war, and putting enough muscle behind the new technologies so that it can ride up the curve of the new technology, defending against it adequately, and perhaps deploying it itself.

BUT HAS A NEW FORM OF WAR ARRIVED?

In the late nineteenth century fortunes were made in chemistry for materials and explosives. In the early part of the twentieth century extraordinary wealth for a few individuals came from coal, oil, automobiles, and airplanes. In the last thirty years that extraordinary wealth has come to the masters of information technology through companies such as Microsoft, Apple, Oracle, Google, and Facebook. Information technology is the cutting edge. And so, based on history, one should expect that technology to be where warfare will change.

Indeed, we saw in WW II the importance of cryptography and the breaking of cryptography, and the machines built at Bletchley Park in service of that gave rise to digital computers.

In the last few years we have seen how our information infrastructure has been attacked again and again for criminal reasons, with great amounts of real money being stolen, solely in cyberspace. Pacifists^{\big 2} might say that war is just crime on an international scale, so one should expect that technologies that start out as part of criminal enterprises will be adopted for purposes of war.

We have seen over the last half dozen years how non-state actors have used social media on the Internet to recruit young fighters from across the world to come and partake in their kinetic wars where those recruiters reside, or to wage kinetic violence inside countries far removed physically from where the recruiters reside. The Internet has been a wonderful new stand off tool, allowing distant ring-masters to burrow in to distant homelands and detonate kinetic weapons constructed locally by people the ring-masters have never met in person. This has been an unexpected and frightening evolution of kinetic warfare.

In the early parts of this decade a malicious computer worm named Stuxnet, most probably developed by the US and Israel, was deployed widely though the Internet. It infected Microsoft operating systems, and sniffed out whether they were talking to Siemens PLCs (Programmable Logic Controllers), and whether they were controlling nuclear centrifuges. Then it slowly degraded those centrifuges while simulating reports that said all was well with them. It is believed that this attack destroyed one fifth of Iran’s centrifuges. Here a completely cyber attack, with standoff all the way back to an office PC, was able to introduce a kinetic (slow though it may have been) attack in the core of an adversary’s secret facilities. And it was aimed at the production of the ultimate kinetic weapon, nuclear bombs. War is indeed evolving rapidly.

But now in the 2016 US presidential election, and again in the 2017 French presidential election we have seen, and all the details are not yet out, a glimpse of a future warfare where kinetic warfare is not used at all. Nevertheless it has been acts of war. US intelligence services announced in 2016 that there had been Russian interference in the US election.  The whole story is still to come out, but in both the US and French elections there were massive dumps of cyber-stolen internal emails from one candidate’s organization, timed exquisitely in both cases down to just a few minutes’ window of maximum impact. This was immediately, minutes later, followed by seemingly unrelated thousands of people looking through those emails claiming clues to often ridiculous malevolence. In both elections the mail dumps included faked emails which had sinister interpretations, uncovered by the armies of people looking through the emails for a smoking gun. These attacks most probably changed the outcome of the US election, but failed in France. This is post kinetic war waged in a murky world where the citizens of the attacked country can never know what to believe.

Let us be clear about the cleverness and monumental nature of these attacks. An adversary stands off, thousands of miles away, with no physical intrusion, and changes the government of its target to be more sympathetic to it than the people of the target country wanted. There are no kinetic weapons. There are layers of deception and layers of deniability. The political system of the attacked country has no way to counteract the outcome desired and produced by the enemy. The target country is dominated by the attacking adversary. That is a successful post kinetic war.

Technology changes how others act and how we need to act. Perhaps the second amendment to the US Constitution, allowing for an armed civilian militia to fight those who would destroy our Republic, is truly obsolete. Perhaps the real need is to equip the general population of the United States with tools of privacy and cyber security, both at a personal level, and in the organizations where they work. Just as WW II showed the obsolescence of physical borders to protect against kinetic devices raining from the sky, so too now we have seen that physical borders no longer protect our fundamental institutions of civil society and of democracy.

We need to learn how to protect ourselves in a new era of post kinetic war.

We see a proposed 2018 US Federal budget building up the weapons of kinetic war way beyond their current levels. Kinetic war will continue to be something we must protect against–it will remain an avenue of attack for a long time. We saw above how the English long bow was still a credible weapon, coexisting with cannon and other uses of gun powder for centuries, though now its utility is well gone.

However, we must not give up worrying about kinetic war, but we must start investing in strength and protection against a new sort of post kinetic war that has really only started in the last twelve months. With $639B slated for defense in the proposed 2018 budget, and even $2.6B for a border fence, surely we can spend a few little billions, maybe even just one or two, on figuring out how to protect the general population from this newly experienced form of post kinetic war. I have recommendations^{\big 3}.

We don’t want the United States to have its own Kodak moment.



^{\big 1}For instance, in just six months from this last October to April, more jobs were lost in retail in the US than the total number of US coal jobs. Not only did natural gas, wind, and solar technology decimate coal mining, jobs never to return, but information technology has enabled fulfillment centers, online ordering, and delivery to the home, completely decimating the US retail sector, a sector that is many times bigger than coal.

^{\big 2}I do not count myself as a pacifist.

^{\big 3}Where in the Federal Government should such money be spent? The NSA (National Security Agency) has perhaps the most sophisticated group of computer scientists and mathematicians working on algorithms to wage and protect against cyber war. But it is not an agency that shares that protection with the general population and businesses, just as the US Army does not protect individual citizens or even recommend how they should protect themselves. No, the agency that does this is NIST, the National Institute of Standards and Technology, part of the Department of Commerce.  It provides metrology standards which enable businesses to have a standard connection to the SI units of measurement.  But it also has (with four Nobel prizes under its belt) advanced fundamental physics so that we can measure time accurately (and hence have working GPS), it has been a key contributor, through its measurements of radio wave propagation. to the 3G, 4G, and coming 5G standards for our smart phones, and it is contributing more and more to biological measurements necessary for modern drug making.  But for the purpose of this note its role in cybersecurity is omni important. NIST has provided a Cybersecurity Framework for businesses, now followed by half of US companies, giving them a set of tools and assessments to know whether they are making their IT operations secure. And, NIST is now the standards generator and certifier for cryptography methods.  The current Federal budget proposal makes big cuts to NIST’s budget (in the past its total budget has been around $1B per year).  Full disclosure: I am a member of NIST’s Visiting Committee on Advanced Technology (VCAT). That means I see it up close. It is vitally important to the US and to our future. Now is not the time to cut its budget but to support it as we find our way in our future of war that is post kinetic.

Gas Mileage, with a Side of British Units

rodneybrooks.com/gas-mileage-with-a-side-of-british-units/

In the United States we measure how much gasoline an automobile uses in units of “miles per gallon”, often referred to as the car’s “fuel economy”. Elsewhere in the world it is measured in “liters (litres) per 100 kilometers”.

Since both gallons and liters are volume measurements when we do a dimensional analysis of these quantities, we get for the United States L/(L^3) or L^{-2}, and for the rest of the world L^3/L or L^2. In both cases it comes out as an area measurement, inverted in the case of the United States.

For 25mpg in the US (which is 9.4 liters per 100 kilometers), we can calculate the area knowing that a US gallon is defined to be 231 cubic inches, and since a mile is 5,280 feet, or 63,360 inches, the area (un-inverted) in square inches is: 

    \[\frac{231}{25\times 63360} = 0.000145833\]

 or a square 0.0121 inches on a side or a circle that is 0.01363 inches in diameter. In metric units, given that a liter is 1,000 cubic centimeters, or 1,000,000 cubic millimeters, and a kilometer is 1,000,000 millimeters, the area is 0.094 square millimeters, which is a square 0.3066 millimeters on a side, or a circle 0.3460 millimeters in diameter.

So that’s the area. But what does it mean physically?  It is the cross section of the volume of gasoline that it takes to drive 25 miles, or in the other units 100 kilometers, stretched out to that length. If we form it into a very long cylinder then the area is the cross section of the cylinder. So a 25mpg automobile has the contents of its gas tank stretched out over the whole length of its journey, into a cylinder of gasoline with diameter 0.346 millimeters, and the car is precisely eating that cylinder as it drives along!

Of course, having grown up in Australia back when we used Imperial British units for everything, I have always preferred expressing gas mileage in acres^{\big 1}. And 25mpg turns out to be 2.325 \times 10^{-11} acres.

A Boeing 747 burns about 5 gallons of fuel per mile, or 12 liters per kilometer, so it is eating up a cylinder with a cross sectional area of 12 square millimeters, which is a cylinder with a 3.9 millimeter diameter, roughly 100 times more than an automobile.

The first stage (S1-C) of the Saturn V moon rockets burned out at about 61 kilometers up, having consumed 770,000 liters of RP-1 kerosene. That means it consumed a cylinder of fuel with 12,623 square millimeters, i.e., a diameter of 126.8 millimeters, or just about exactly five inches.  Now that is a gas guzzler!



^{\big 1}What is an acre? It is derived from the amount of land tillable by a yoke of oxen in one day–and a long strip of land is more efficient to till than a more boxy area as you have to change direction less often. So an acre is defined as one “chain” wide, by 10 chains long. And a chain?  It is a 100 links, or exactly 22 yards (and also exactly four “rods” long, each of which is sixteen and a half feet long). So the standard tillable plot was 22 yards wide, and 220 yards long, which happens to be one eighth of a mile long, otherwise known to horse racing enthusiasts as a furlong, or “furrowlong”!

An acre plot was one eighth of a mile long and one eightieth of a mile wide, which is why there are 640 acres in a square mile. Of course once an acre is an area it can be any shape, and it is 4,840 square yards, or 43,560 square feet, which itself is precisely 99% of 44,000 square feet. Since a square rod is known as a perch, an acre is 160 perches. And BTW, the playing area of a standard US football field is roughly 0.9 acres.

Don’t even get me started on Imperial British units for weight, including a hundredweight (which is 112 pounds, of course), one 20th of an Imperial ton (2,240 pounds), and itself four quarters (28 pounds each), or 8 stone (14 pounds each). No, I won’t get started… and certainly not on money made up of pounds, shillings, and pence, with 20 shillings to a pound, and 21 shillings to a guinea for fancy stores, with 12 pence to a shilling, and a half crown was two shillings and six pence, or 2/6 (“two and six”, or 2s 6d). I won’t get started there, either…

Patrick Winston Explains Deep Learning

rodneybrooks.com/patrick-winston-explains-deep-learning/

Patrick Winston is one of the greatest teachers at M.I.T., and for 27 years was Director of the Artificial Intelligence Laboratory (which later became part of CSAIL).

Patrick teaches 6.034, the undergraduate introduction to AI at M.I.T. and a recent set of his lectures is available as videos.

I want to point people to lectures 12a and 12b (linked individually below). In these two lectures he goes from zero to a full explanation of deep learning, how it works, how nets are trained, what are the interesting problems, what are the limitations, and what were the key breakthrough ideas that took 25 years of hard thinking by the inventors of deep learning to discover.

The only prerequisite is understanding differential calculus. These lectures are fantastic. They really get at the key technical ideas in a very understandable way. The biggest network analyzed in lecture 12a only has two neurons, and the biggest one drawn only has four neurons. But don’t be disturbed. He is laying the groundwork for 12b, where he explains how deep learning works, shows simulations, and shows results.

This is teaching at its best. Listen to every sentence. They all build the understanding.

I just wish all the people not in AI who talk at length about AI and the future in the press had this level of technical understanding of what they are talking about. Spend two hours on these lectures and you will have that understanding.

At YouTube, 12a Neural Nets, and 12b Deep Neural Nets.

Robot Is A Hijacked Word

rodneybrooks.com/robot-is-a-hijacked-word/

The word “robot” has been hijacked. Twice. (Or thrice, if we want to be pedantic, but I won’t be.)

THE ORIGINAL SPIN

The word “robot” was introduced into the English language by the play R.U.R., written in Czech by Karel Capek, and first performed in Prague on January 25, 1921. R.U.R. stands for Rossumovi Univerzálni Roboti, though even in the first edition of the play, according to the play’s Wikipedia page, published by Aventium in Prague in 1920, the cover designed by Karel’s brother Josef Capek, had the English version as the title, Rossum’s Universal Robots, even though the play within was in Czech.

According to Science Friday, the word robot comes from an old Church Slavonic word, robota, meaning “servitude”, “forced labor”, or “drudgery”. And the more you look the more references indicate that it is not known whether Karel or Josef suggested the word.

But, in any case, in the play the robots were not electro-mechanical devices, in the way I have used the word robot all my life, in agreement with encyclopedias and Wikipedia. Instead they were “living flesh and blood creatures”, made from an artificial protoplasm. They “may be mistaken for humans and can think for themselves”. Both quotations here are from the Wikipedia page about the play, linked to above. According to Science Friday they “lack nothing but a soul”.

This is the common story, more or less, about where the word robot came from.

<aside>

But, but, maybe not.  According to this report the word “robot” first appeared in English in 1839. 1839! It says that robot at that time referred not to an individual, neither machine, nor protoplasm, nor electro-mechanical , but rather to a system, a “central European system of serfdom, by which a tenant’s rent was paid in forced labour or service”. Ultimately that word came from the same Slavonic root.

So perhaps in English the word “robot” changed in meaning between 1839 and 1920. Though realistically perhaps no one who picked it up from the Capek brothers in 1920 had ever heard of it from the old 1839 meaning. And in any case it seems such a different use that I don’t think it really is a hijacking. Just as “field” was not “hijacked” in going from a field of wheat to a field of study.

I am not going to count this as a “robot” hijacking.

</aside>

In 1920 “robot” referred to humans without souls, manufactured from protoplasm. But that meaning changed quickly.

THE FIRST HIJACKING

By the time I was deciding what really interested me in life, the word “robot” had turned into meaning a machine, as given by the online English Oxford Living Dictionaries, where it defines the word as:

A machine capable of carrying out a complex series of actions automatically, especially one programmable by a computer.

Here is the cover of the January 1939 edition of Amazing Stories.

There is a robot, a machine, right on the cover, illustrating a story titled I, Robot. But for me the big news is that the author of that story is Eando Binder (a nom de plume for Earl and Otto Binder), rather than Isaac Asimov–we’ll get back to Dr. Asimov in just a minute. I have found one slightly earlier reference to mechanical robots, but it is only a passing reference. In this Wikipedia list of fictional robots there is a story titled Robots Return, by Robot Moore Williams, dated 1938. Unfortunately I do not have the text of either of the 1938 or 1939 stories, so can’t tell whether the authors assume that their readers implicitly understand to what the word “robot” refers.

However, in 1940, only twenty years after the R.U.R. play was first published with the English version of robot on its cover, Isaac Asimov published his story Strange Playfellow in Super Science Stories, an American pulp science fiction magazine (of the Binder story he said “It certainly caught my attention” and that he started work on his story two months later). Later, retitled as Robbie, the story was the first one that appeared in Asimov’s collection of stories published as the book I, Robot, on December 2, 1950.  I have a 1975 reprint of a republished version of that book from 1968. In that reprint, at the bottom of the third page, after describing a little girl, Gloria, playing with a mechanical humanoid Robbie, Asimov uses the word “robot” to refer to Robbie (note the alliteration) in a very casual way, as though the reader should know the word robot.  The word “robot” got hijacked in just 20 years, in the popular culture, or at least in the science fiction popular culture, from meaning a humanoid made of protoplasm in a Czech language play in Prague, to meaning a machine that could walk, play with, and communicate with humans. Note that 1940 was before programmable computers existed, so there was some more evolution to get to the definition involving computers as quoted above.

In the 1920’s such mechanical humanoids seem to have been referred to as “automatons”.

I have no idea what contributed to that transformation of the word “robot”, but I am eager to see any citations that might be offered in the comments section.

So now we have the first real hijacking of “robot”. But there has been a more recent one. I may not care deeply about the earlier hijacking, but I sure do care about this one. I am an old school roboticist in the sense of the definition in italics four paragraphs back. And my meaning has been hijacked!!

THE SECOND HIJACKING

In the more recent hijacking, “robot” has come to mean some sort of mindless software program, that does things that are relentless, or sometimes even cruel, though sometimes amusing and helpful. This new use is getting so bad that it is often hard to tell from a headline with the word “robot” to which form of robot the story refers. And I think it is giving electromechanical robots, my life’s work, a bad name.

I think it starts with this secondary definition which also appears with the English Oxford Living Dictionaries definition from above.

Used to refer to a person who behaves in a mechanical or unemotional manner.

And then it includes an example of usage: ‘public servants are not expected to be mindless robots’.

Now Asimov’s robots have not been mindless, and none of mine have ever been mindless (well, perhaps my insect-based robots were a little mindless, certainly not conscious in any way). But the first industrial robots introduced into a GM automobile plant in 1961 certainly were mindless. They did the same thing over and over, without sensing the world, and did not care whether or not the parts or sheet metal they were operating on was even there. And woe be a person who got in their way.  They had no idea there was someone there, and even had no idea that someone, or anyone for that matter, existed, or even could exist. They did not have computers controlling them.

I may be wrong, but I trace the hijacking of the word “robot” to two things that happened in 1994. I am guessing that there is some earlier history, but that I am just not aware of it. Ultimately it is all the fault of the World Wide Web…

Tim Berners-Lee invented the World Wide Web and put up the first Web page in 1991. Explosive growth in the Web started soon after, and by 1994 there were multiple attempts to automatically index the whole Web. Today we use Google or Bing, but 1994 was well before either of those existed. The way today’s search engines, and those of 1994, know what is out on the Web is that they work in the background building a constantly updated index. Whenever someone searches for something it is compared to the current state of the index, and that is what is actually searched right then and there–not all the Web pages spread all over the world. But the program that does search all those pages in the background is known as a Web Crawler.

By 1994 some people wanted to stop Web Crawlers from indexing their site, and a convention came about to put a file named robots.txt in the root directory of a web site. That file would never be noticed by Web browsers, but a politely written Web Crawler would read it and see if it was forbidden from indexing the site, or if it was to stay away from particular parts of the site, or how often the owner of the site thought it reasonable to be crawled. The contents of such a file follow the robots exclusion standard of which an early version was established in February 1994.

You can see a list of 302 known Web Crawlers (listed as a “Robots Database”!) currently active (I was surprised that there are so many!).  Of the 302 Web Crawlers, 29 include “robot” in their name (including “Robbie the Robot”), 18 include “bot” (including “Googlebot”), three have “robo”, and one has “robi”.  Web Crawlers, which are not physical robots, have certainly taken on “robot” as part of their identity. Web Crawlers became “robots”, I guess, for their mindless search of the Web, indexing it all, and following every link without understanding what was there.

That same year there was another innovation. There had been programs, all the way back to the sixties, that could engage in forms of back and forth typed language with humans. In 1994 they got a new name, chatterbots, or chat bots, and some of them had a more or less permanent existence on particular Web pages. Probably they attracted the suffix “bot”, as they could seem rather mindless and repetitive, again harking back to the dictionary example above of mindlessness of robots.

Now we had both Web Crawlers and programs that could converse (mostly badly) in English that had taken parts of their class names from the word “robot”. Neither were independent machines. They were just software. Robots had gone from protoplasm, to electromechanical, to purely software.  While no one is really building machines from protoplasm, some of us are building electromechanical devices that are quite useful in the world. Our robots are not just programs.

Since 1994 the situation has only gotten worse! “Robo”, “bot” and “robot” have been used for more and more sorts of programs.  In a 2011 article, Erin McKean pointed out that there were “robo” prefixes, and “bot” suffixes, and that at that time, in general, robo has a slightly more sinister meaning than bot. There was “Robocop”, definitely sinister, and there were annoying “robocalls” to our phones, “robo-trading” in stocks caused the 2010 “Flash Crash” of the markets, and “robo-signers” were people signing foreclosure documents in the mortgage crisis. Chat bots, twitter bots, etc., could be annoying, but were not sinister.

Now both sides are bad. We see malicious chat bots filling chat rooms intended for humans, we see “botnets” of zombie computers, taken over by hackers to launch massive denials of service on people or companies or governments all over the world.

Here is a list of varieties of bots, all software entities. Some good, some bad.

At some web sites, such as topbots.com, it seems to be all about “bots”, but it is hard to tell which ones are software or if any have a hardware component at all. Now “bots” seems to have become a generalized word for all aspects of A.I., deep learning, big data, and IoT. It is sucking all up before it.

The word “robot” and its components have been taken to new meanings in the last twenty or so years.

A NEW WORD FOR ELECTROMECHANICAL ROBOTS?

Here’s the bottom line. My version of the word “robot” has been hijacked. And since that is how I define myself, a guy who builds robots (according to my definition) this is of great concern. I don’t think we can ever reclaim the word robot (no more so than reclaim the word “hacker”, which used to be only pristine goodness forty years ago, and I was honored when anyone ever referred to me as a hacker, even more so as a “robot hacker”). I think the only thing to do is to replace it.

How about a new word? What should we call good old fashioned robots? GOFR perhaps?

One that comes to mind is “droid”, a shortened version of “android”, which before it was a phone software system meant a robot with a human appearance. Droid distinguishes itself from the hijacked version of android which refers to phone software, and is generally understood by people to mean an electromechanical entity, an old style robot. Star Wars is largely responsible for that general perception. But that is also the problem in trying to use the word more generally. The Star Wars franchise has the word “droid” completely bottled up and copyrighted. I know of three robot start up companies that wanted to use the word “droid”, and all three gave up in the face of legal problems in trying to do that.

No droids. Unfortunately.

So…what should the new word be? Put your suggestions in the comment section^{\big 1}, and let’s see what we can come up with!



^{\big 1}I manually filter all comments, as the majority of comments posted are actually advertisements for male erectile dysfunction drugs…

Megatrend: The Demographic Inversion

rodneybrooks.com/megatrend-the-demographic-inversion/

Megatrends are inexorable, at least for some decades, and drive major changes in our world. They may change our planet permanently, they change the flow of money within our society, they drive people to move where they live, they kill some people and cause others to be born, they change the fortunes of individuals, and they change the shape of our technology. When a megatrend is in full swing it is quite visible for all to see, and with a little thought we can make predictions about many of the things it will cause to change in our world. Having accurate predictions in hand empowers individuals to make canny decisions.

A few of today’s megatrends are global warming, urbanization, increased use and abuse of antibiotics, the spread of invasive species, increased plastic in the seas, rapid species extinctions, and changing demographics of the world’s human population. We see all these happening, but we can feel powerless to change them, and certainly they are impossible to change in the short term.

In this post I will talk about the demographic inversion that is happening in many parts of the world, and how that will drive much of our technology, and in particular our robotics technology, for the next thirty to fifty years.

We know how many 25 to 29 year olds there are in China today, and we know an upper bound on how many there will be in 20 years–less than two thirds of the number today. No political directives, government coercion, or technological break throughs are going to change that. This population trend is real and now unavoidable, and it is man and woman made. We can’t change this fact about the world twenty years hence. Given the proportion of the world’s population that is in China, and that more than half the other countries have similar trends, the aging of human society is a megatrend.

Magnitude of the inversion

Here is the data on which I based my comments above, in a diagram I got from the CIA World Factbook, yes, that CIA.

This is a pretty standard format for showing population distributions. Male and female are separated to the left and right, and then their age at the date of the data is histogrammed into five year intervals. This is a snapshot of the age distribution of the Chinese population in 2016. One can see the impact on the birth rate of the hardships of the cultural revolution, followed by a population boom a dozen years later. And then we see the impact of the one child policy as it was enforced more and more strongly following its introduction in 1979, with an uptick in the echo of the earlier population boom. The one child policy was phased out in 2015, but there is some evidence that the culture has been changed enough that one child couples might continue at a higher rate than in many other countries with equally strong economies. We also see here the impact of the cultural desire for male children over female children when there is a restriction to only one child. Not all the extra female children were necessarily subject to abortion or infanticide however, as it is strongly believed that there is a large ghost population of female children in China, girls who existence is hidden from authorities.

Here is the same data for Japan, and now we see a truly scary trend.

There are just less and less younger people in Japan in an unbroken trend lasting for forty years. Given the fertility age profile for women, forty years of decrease is really hard to turn around to an upward trend without immigration which Japan very much eschews.

The real impact is on age distribution. Again, from the CIA World Factbook, currently 27.28% of the Japanese population is 65 or older, compared with only 15.25% of the US population, or 6.93% of the Mexican population. In Japan in 2016 there were for every 1,000 people only 7.8 births while there were 9.6 deaths. Looked at another way, the average number of births per woman in Japan is now only 1.41, compared to the obvious number of 2.0 needed for population replacement, but more like 2.1 to cover early deaths. While the population of Japan is shrinking, the ratio of older people to younger people is going to get larger and larger.  There is detailed coverage of the aging of Japan in Wikipedia, and here is a critical graph (under Creative Commons License) from that article:

By 2050 predictions are that over 35% of the population of Japan will be 65 or older. All those people are already in existence. Only a truly stupendously radical change in birth rate can possible change the percentage. There are no indications that such a change is coming. Here is another way of looking at how the population is changing, this one from 2005 from the Statistics Bureau of the Japan Ministry of Health, Labor, and Welfare.

This is the typical, if somewhat more extreme in this case, change of shape of the population of developed countries in the world. Population is changing from being bottom heavy in age to top heavy. This is true of Europe, North America, Japan, Korea, China, and much of South America.

The current population of the world is about 7.5 billion people.  Predicting how that number will change over the next century still has a lot of uncertainty, largely driven by uncertainty over whether the fertility rate (how many children each woman has on average) will drop as quickly in sub-saharan Africa as it has in the rest of the world whenever the standard of living increases. But the growth in the world population does seem to be slowing down. On average, then, the pattern of an increasing ratio of older to younger people that we have seen above for China and Japan will be the overall pattern for the world over the next thirty to forty years. India is lagging Japan in this regard by about 50 years, so it will soon start happening there too, even as India becomes the most populous country in the world, surpassing China.

Consequences of the inversion

Thirty years ago in much of the developed world we saw schools built to handle the post war baby boomers close after that bubble of children worked its way through the system. Today we are starting to see the consequences of the demographic inversion. It is showing up in two ways: (1) less young people filling certain job categories, creating a pull on automation in those arenas, and (2) uncertainty about how care can be provided for the huge overhang of elderly and very old people we are about to see.

(1) Less workers. Some jobs that seemed attractive in earlier times no longer seem quite as attractive as younger people have more educational opportunities, and aspirations for more interesting jobs. Two categories for which this is particularly true are farming and factory work. Farming has the additional negative aspect of often requiring people to live away from particular urban areas that they might otherwise choose with more geographically mobile professions. These jobs both require physical labor, often in unpleasant conditions. Neither is a job that many people would take up at a later stage in life.

Food supply

The average age of a Japanese farmer is now 67, and in all developed nations the average age is 60.  Agriculture ministers from the G7 last year were worried about how this high age could lead to issues over food security. And as the world population is still increasing, the need for food also increases.

The Japanese government is increasing its support for more robots to be developed to help with farming. Japanese farms tend to be small and intensely farmed–rice paddies, often on terraced slopes, and greenhouses for vegetables. They are looking at very small robotic tractors to mechanize formerly manual processes in rice paddies and wearable devices, exoskeletons of sorts, to help elderly people, now that their strength is waning, continue to do the same lifting tasks with fruits and vegetables that they have done for a lifetime.

In the US farms tend to be larger, and for things like wheat farming a lot of large farm equipment is already roboticized.  Production versions of many large pieces of farm equipment, such as those made by John Deere^{\big 1} (see this story from the Washington Post for an example) have been capable of level 3 autonomous driving (see my blog post for a definition) for many years, and can even be used at level 4 with no one in the cab (see this 2013 YouTube video for an example).

There is now robotics research around the world for robots to help with fruits and vegetables. At robotics conferences one can see prototype machines for weeding, for precision application of herbicides and insecticides, and for picking fruits and vegetables. All these parts of farming currently require lots of labor. In the US and Europe only immigrants are willing to do this labor, and with backlashes against immigration it leaves the land owners with no choice but to look for robotic workers, despite the political rhetoric that immigrants are taking jobs that citizens want–it is just not true.

Tied into this is are completely new ways to do food production. We are starting to see more and more computer controlled indoor farming systems both in research labs in Universities and in companies, and as turn key solutions from small suppliers such as Indoor Farms of America and Cubic Farms, to name just two. The key idea is to put computation in the loop, carefully monitoring and controlling temperature, humidity, lighting, water delivery, and nutrient delivery. These solutions use tiny amounts of water compared to conventional outdoor farming. More advanced research solutions use computer vision to monitor crop growth and put that information into the controlling algorithms. So far we have not seen plays in this space from large established companies, but I have seen research experiments in the labs of major IT suppliers in both Taiwan and mainland China. We now have enough computation in the cloud to monitor every single plant that will eventually be consumed by humans. Farming still requires clouds, jut entirely different ones than historically. Indoor farms promise much more reliable sources of food than those that rely on outside weather cooperating.

Once food is grown it requires processing, and that too is labor intensive, especially for meat or fish of any sort. We are still a few years away from bionically grown meat that is practical, so in the meantime, again driven by lack of immigrants and a shortage of young workers, food processing is turning more and more to automation and robots. This includes both red meat cutting and poultry processing. These jobs are hard and unpleasant, and lead to many repetitive stress injuries. There are now many industrial robots in both the US and Australia being used to do some of these tasks. Reliance on robots will continue to grow as the population ages.

Manufacturing

Manufacturing is an area where there has been great fear of job loss due to automation.  But my own experience in manufacturing in both China and the United States over the last 20 years is that there is a shortage of manufacturing labor and that the labor force is aging.

I have been involved in manufacturing in mainland China since 1997. In the early years it was with iRobot, and the companies I worked with were all based in Taipei or Hong Kong, but had plants in Guangdong Province. Around 2004 I started to notice that we were losing a lot of workers over Golden Week, the biggest holiday period in China where most people travel in unbelievably crowded trains over incredibly long periods to go “home” to visit parents and other family. We started seeing that our production lines were suffering after Golden Week as so many workers would not return, and it might take a couple of months to make up for those losses. This sometimes had real impact on our business with a drop in deliverable product.

By around 2005 in my role as Director of CSAIL (Computer Science and Artificial Intelligence Laboratory) at M.I.T., I was working with a number of very big manufacturing companies based in Taipei, with plants spread over a much wider area of mainland China. These companies did not have their own brands then, but many now do. They built many of the IT products that we in North America use on a daily basis. They were working with M.I.T. in order to move up the value chain from being OEMs (Original Equipment Manufacturers) for US brands to having the technology and internal R&D to develop their own unique and branded products. The message I got over and over again, often from the original founders who had started the companies in the 1970’s, but were now enabling a new generation of management to take the companies forward, was that it was getting harder and harder to get sufficient labor in China. I remember one particular discussion (and I most likely don’t have the exact wording correctly here, but this is how I remember it): “in the old days we would put up a single sign, 3 inches by 5 inches, and the next morning we would have a line of prospective workers around the block–now we employ advertising agencies in Shenzhen and run ads on TV, and still we can’t get enough workers”. They told me that their two biggest problems in China were worker recruiting and worker retention. That was in 2005.

Today as I talk to manufacturers in China I find that very well run companies, with lots of retention strategies in place will have a labor turn over rate of 15% per month (per month!!). Less well run companies will have up to 30% per month. Imagine trying to run a business with that level of turnover.

The reasons for the drop in eager manufacturing labor in China are complex. The demographic charts above tell a big part of the story. But another part is that the general standard of living has risen in China, people have more access to education, and they have higher aspirations. They don’t want to work in a factory at repetitive jobs–they want more meaningful work. All humans want to do meaningful things once they are beyond desperately trying to survive.

At the same time as this, I was working as an advisor to John Deere, visiting many of their factories in the US, and seeing how they were suffering from an aging workforce with no prospects for younger replacement labor to come along, in towns in the mid-west where the young would leave for bigger cities as soon as they had a chance. It wasn’t that there were not jobs in those plants and smaller cities, but that the youth was heading for bigger cities.

Those trends that I was seeing back then have been borne out in the decade since.

In this 2013 story at Bloomberg it is reported that the median age of a highly skilled US manufacturing worker was 56. And this 2013 report from the Manufacturing Institute compares how the median age of all manufacturing workers, skilled or unskilled, is rising compared to the age of other workers.

We can see that the median age of a manufacturing worker is going up by a year every two or three years, and that it is going up faster than for other non-farm jobs. In the 12 years shown here, manufacturing workers have gone from being 1.1 years older than other workers, to 2.7 years older.

My observations of a decade ago alerted me to the fact that worldwide we would be having a shortage of labor for manufacturing. This has indeed come to pass. Naturally I thought about whether robots could make up the shortfall. But it seemed to me that industrial robots of the time were not up to the task.

While there were a lot of robots working around the world in automobile factories they existed in a complete apartheid from humans. In the body and paint shops it was all robots and no humans, and in the assembly lines themselves it was all humans and no robots. The fundamental reason for this was that industrial robots were not safe to be around. They had no sensors to detect humans and therefore no way to avoid hitting them, with tremendous force, should humans stray into their workspace.

For a set of tasks which robots could do by themselves, repeatedly and reliably, humans were banished from that part of the factory, both for the safety of the humans, and so that humans wouldn’t mess up the totally controlled order needed for robots with hardly any sensors to see changes in the world. Where there  were tasks that humans but not robots could do, the solution was to have humans do all the tasks, and to banish robots.

There were many consequences of this dichotomy. First, it meant that the installation of robots required a well thought out restructuring of a factory floor, and turned into a process that could easily take a year, and involve much peripheral equipment, such as precision feeders, to support the robots. Second, because the robots and people were segregated there was no thought put into ease of use of robots, and use of modern user interfaces–there were no human users! Third, the robots had to be 100% successful at everything they were tasked with doing, or otherwise a human would have to enter the robot domain to patch things up, and that meant that all the robots, perhaps hundreds of them, would have to be stopped from operating. Fourth, the human factory workers had no direct experience of robots, no understanding of them as tools (as they did of electric drills, say), and so the alienness of them contributed to the “us” vs. “them” narrative that the press loves to propagate.

So in 2008 I founded Rethink Robotics (née Heartland Robotics) developing smart collaborative robots to address these shortcomings of industrial robots. The idea, and reality, is that the robots are safe to work side by side with humans so there is no longer any segregation. The robots come with force sensing and built in cameras so that out of the box they are useful and require much less peripheral equipment to control the environment for them. The robots have modern user interfaces, which, like the user interface of a smart phone, teach a user how to use the system as the user explores, so that ordinary factory workers and technicians can use the robots as tools. The robots get regular software upgrades, just like our phones and computers, so more value is added to them during their lifetime. And lastly, the robots are easily able to control other equipment around them, again reducing the set up and integration time.

These sorts of robots are growing in popularity, and are leading a revolution in automation of the 90% of factories in the world that are not automobile factories. They are beginning to answer the severe labor shortage, world wide, that the demographic inversion is causing.

Fulfillment

Much of technology over the last 50 years has been used as a way of shifting service labor to the end user, and thus reduce the number of workers needed in order to provide a service. Examples of this are bank ATMs, self service gas stations, supermarkets rather than separate butcher shops, fruit and vegetable stores, and dry goods stores with all the goods behind a counter, vending machines, automated checkout lines, check-in kiosks at airports, on line travel reservation web sites, word processors, cheap printers, and electronic calendars replacing administrative assistants, etc.

One technological development that is different in this regard is fulfillment services for online shopping. No longer do we need to travel to a physical location for a particular sort of purchase, walk into the store, take physical possession of the object, take it out to our car, transport it to our house, and then unpack it. Now we choose an object, or a wide variety of objects that would previously be located at geographically separated locations, and they arrive at our homes with no more personal effort from us.

This has shifted and restructured labor for getting goods to our houses. And it is a mixture of it being a challenge to get enough workers in the chain of steps that run from online order to the goods being in the home, and the fact that this convenience will let the elderly stay in their homes longer, beyond when they are up for all the active shopping that yesteryear would have required.

The chain of events in fulfilling an order is roughly as follows:

  1. Go to many different locations in a warehouse where each of the ordered objects are stored.
  2. Pick up each object.
  3. Pack them in in a box.
  4. Move the box from the warehouse to some sort of long distance transportation node.
  5. Transport the box a long distance to a terminal node.
  6. Take the box from the terminal node and deliver it to the customer’s house

This set of steps is stereotyped and may not be exactly what happens for every order. For instance sometimes there is only one item in the order. Sometimes steps four and five might involve a combination of steps from a shipping depot for a particular carrier (e.g., FedEx or UPS), to their airport operations, then two flights to a final airport, then a truck to their distribution node. The final delivery may be via multiple steps, first to the post office and then the regular postal service, or a special postal service, or perhaps a direct truck delivery from the carrier’s shipping node.

Fulfillment companies have to hire lots of temporary labor at different times of the year to meet demand, and even at the best of times it can be hard to get labor willing to do some of these jobs. So those companies are trying to automate many of the steps, and that often involves robots.

Let’s look at Amazon, for instance.

In steps 1 and 2, above, people have to move around all over enormous warehouses in order to get the items together to fulfill an order. This step is called picking, as the person needs to pick up all the items for the order. One of the first improvements in efficiency was to have a single person be picking for multiple orders at the same time. A program would group together a set of orders so that the person could carry all the items in their cart, but chose the orders so that if a person was picking up items A and B for one order, it would find a second order where perhaps there was an item C that the person would pass by going from item A to item B. When the person got back to the packing station for step 3, it might be that they packed all the items for their orders, or it could be that a different person there would do all the packing, so that pickers and packers were doing specialized jobs.

Most of what a picker did then was to move around from place to place. That is something that robots can be made to do rather easily these days. But the actual picking up of one of maybe hundreds of thousands of different sorts of objects is something that our robots are not at all good at. See, for instance, my quick take on research needed on robot hands.

A start up company in Boston, Kiva Systems, tackled this in a brilliant way. Kiva asked its customers to store all their fulfillment items on standardized shelving modules. The pickers all had fixed workstations to one side of the warehouse. Then small flat-ish robots would go out, drive under a shelving unit, left it up, and bring it to a picker, arriving just as their hands were free ready to pick a new item. A screen would tell the person what to pick, an LED on the relevant shelf would light up, and miraculously the item would be in easy reach at just the right moment. By the time the picker had scanned the item with a bar code reader to confirm they had the right object the next shelving unit would be right there ready for their next pick.

Humans are still way better at picking than robots, but it is easy to automate moving around. The brilliance here was to change what was moving. In the old way the human picker moved from shelf to shelf. In the new way, the shelves moved around to the picker.

This approach turned out to be such an increase in efficiency that Amazon bought the company, and has since expanded it greatly into Amazon Robotics based in Boston. All new Amazon distribution centers are using this technology.

As a side effect, that left other fulfillment companies without the option of using Kiva Systems, so now there are at least five start up robotics companies in Boston attacking the same problem–it will be interesting to see when each of them comes out of stealth mode how they manage to work around the intellectual property covered by Kiva/Amazon patents.

Amazon now runs an annual “pick challenge” at a major robotics conference that moves from continent to continent on a yearly basis. They are encouraging academic researchers to compete on how well their robots (it is really their algorithms–over half the competitors use robots from my company, Rethink Robotics) are able to do the picking task.

Amazon regularly announces large hiring goals for workers in fulfillment centers but they just can not get enough. Only through automation will they be able to meet the challenges.

Amazon, and others, are also looking at step 6 from above. In Amazon’s case they are talking about using drones to deliver products to the home. Just three weeks ago I was present in Palm Springs for what turned out to be Amazon’s first public delivery by drone in North America. The delivery was to Jeff Bezos himself and the box contained many bottles of sunscreen. If only I had realized it was the first such delivery I would have taken a sunscreen bottle when Jeff offered to hand one to me from the box he had just opened!

Other companies are looking at other solutions for that last leg of delivery, and there are regularly press stories about small robots meant to autonomously drive on sidewalks to get things to houses inside lock boxes that only the recipient will be able to open. Other solutions offered involve using the trunk space of on-demand car services, and have the drivers drop the goods at people’s houses between taking paying customers on journeys.

All these step 6 solutions are in their early days, and it remains to be seen whether any of the current proposals really work out, or whether others will be needed.

But in any case there is a clear demand for these current steps, and if researchers can make progress there is plenty of room for robots to work in the actual picking and the packing, besides where current systems are being developed.

(2) Assistance for the elderly. When we look at the ratio of working age people to those sixty five and older we see remarkable changes over half a century or so. In Japan the ratio is going from about nine to one to two to one. In the US it is not quite as extreme, but still extreme. This means that there can not possibly be as large a pool of workers to provide care services for the elderly as there was in the past, and even worse, from that point of view, modern medicine is extending the lives of the elderly so that they are able to survive when they are much older and much frailer.

Well, you say, we should go back to the old way, where the families looked after the elderly. But think what that means for China with the effect of its one child policy. After only two generations of it, a youngish couple will have one child of their own, but four parents and eight grand parents to look after, with no help from siblings or cousins. A modern Chinese couple is raising one child and is solely responsible for 12 older people. Yikes!

In both China and the US, and probably other places there is the additional problem that people move vast distances for work, and so they are not geographically coupled to where their parents are.

I think this means that ultimately most people will have to end up in managed care facilities, but that there will be much smaller pools of human workers to provide services.

Care services for the elderly

Most people resist moving into a managed care facility for as long as possible. This resistance is going to be part of the solution. If care services can come to people’s homes, in whatever form, then people will be able to stay in their own homes much longer.

I think that the elderly want independence and dignity. Those are two very important words, independence and dignity.

At the same time as people get older and more frail they face many challenges. There is danger that they might fall and not be able to get up. They may forget to take their medicines. They may have trouble getting into and out of bed. They may have trouble carrying packages from someplace outside their house that a delivery system delivers it to, into their house. They may have trouble reaching high shelves that they have used all their lives. They may have trouble keeping their house clean, putting things into and out of dish washing machines and clothes washing machines. They may have trouble folding their laundry and putting it away. Ultimately they may have difficulty dressing and undressing themselves. They may have trouble getting enough exercise without the risk of breaking fragile bones. They make have trouble using the bath or shower. They may have trouble using the toilet.

The longer they can get assistance in doing all these things with independence and dignity, the longer they can stay in their own homes.

I think that this is where a lot of robotics technology is going to be applied over the next thirty years, as the baby boomers slide into older and older age.

It is a little too early for actual robots for most of these challenges at this point. We need lots more research in robotics labs. There are a few labs in the US that are looking at these problems, but already in Japan it is a priority and one sees many demonstration robotic systems at big robot conferences in Japan, where research institutes and Universities show off their early ideas on what robots might do to help the elderly with some of these challenges.

I want to stress that this is not research into robot companions for the elderly. Rather it is research into machines that the elderly will be able to use and control, machines that will give them both independence and dignity. I think we all want to stave off the day when we will need a person to wipe our bum, and various sorts of machines can preserve that dignity for longer.

It is early days yet in research on these topics. It is only just now starting to appear on people’s radars that these sorts of robots will be necessary, and that there will be an incredibly large market for them. Today’s research prototypes are too early for commercialization. But the demographic megatrend is going to put a tremendous pull on them. Before too long VCs are going to see a long line of people want to give pitches for funding for various companies to develop such robots. And large established companies in adjacent non-robotic markets are going to be faced with how to transform themselves in home elder care robotics companies.

Driver assist

In my January essay on self driving cars I referenced the levels of autonomy that are usually used to describe just how much self driving is done by the car. Levels 4 and 5 involve no human input at all, the car does indeed drive itself. I am becoming more skeptical by the day that we will have level 4 or level 5 autonomy in our cars any time soon, except in very restricted and special geographies. Level 3 which lets the driver take their hands off the wheel and their attention off of driving is also going to be harder than many people think as the switch back to the human taking over in tricky circumstances is going to be hard to pull off quickly enough when it is needed. Levels 1 and 2 however, where the person is observing what happens and takes over when they need to, will, I think, be fairly commonplace in just a few years.

Levels 1 and 2 for autonomy in cars are going to make them so much more safer, even for bad drivers. And the elderly usually get progressively worse at driving. These levels will take over parking, lane keeping, and eventually almost all of braking, accelerating, and steering, but with the driver’s hands still on the steering wheel.

The technology for all of levels 1 through 5 is really robotics technologies. As we go up through the autonomy levels the cars are progressively becoming robots more and more.

This, for now, is my final arena where the aging population is going to be a pull on new robotic technology. The longer that an elderly person can drive, the longer they can have their independence, the longer they will be able to stay in their own homes, and the longer they can get by without relying on individual services provided to them by someone younger.

Car companies already recognize the need to cater more to an elderly driving population, and many driver assist features that they are introducing will indeed extend the time that many drivers will be able to drive.

Conclusion

The demographic inversion in our population, a megatrend that is happening whether we like it or not, is becoming a significant pull on the need for new automation, IT, and robotics. It is pulling us to more robotic solution in all stages of food production, in manufacturing, in fulfillment and delivery to the home, soon in care services for the elderly, and even now in driver assist features in cars.



^{\big 1}Full disclosure.  For many years I was a member of John Deere’s Global Innovation and Technology Advisory Council.