Blog

[FoR&AI] Future of Robotics and Artificial Intelligence

rodneybrooks.com/forai-future-of-robotics-and-artificial-intelligence/

I plan on publishing  a set of essays on the future of robotics and Artificial Intelligence in the late summer and fall of 2017, perhaps extending in to 2018. I’ll list them all here as they come out. They are designed to be read as stand alone essays, and in any order, but I’ll order them here in my guess at the optimal order in which to read them.

The origins of “Artificial Intelligence” published on April 26, 2018.

Domo Arigato Mr. Roboto published on August 28, 2017.

The Seven Deadly Sins of Predicting the Future of AI published on September 7, 2017.

Machine Learning Explained published on August 28, 2017.

Steps Toward Super Intelligence I, How We Got Here, published July 15, 2018.

Steps Toward Super Intelligence II, Beyond the Turing Test, published July 15, 2018.

Steps Toward Super Intelligence III, Hard Things Today, published July 15, 2018.

Steps Toward Super Intelligence IV, Things to Work on Now, published July 15, 2018.

Experiments In Automobile UI/UX

rodneybrooks.com/experiments-in-automobile-uiux/

Our automobiles are getting three makeovers simultaneously, with promises of a fourth. There is more action in reinventing of automobiles all at once than there has been since the first ones crawled into existence.

First, our cars are turning electric, and the UK recently said that no new gas or diesel automobiles will be allowed on the road starting in 2040. Don’t be surprised to see more countries and states (e.g., California) follow suit.

Second, our cars are getting more driver assist features with lane change and backup audible warnings, automatic parking and lane changing, new forms of smart cruise control, new bumper-to-bumper traffic control options, etc. These are level 1 and level 2 autonomy (see my blog post from earlier in the year), with level 3 starting to show up just a little, and overly enthusiastic predictions of  levels 4 and 5 (again, you can see some of my thoughts on how I think that is further off than expected).

And third, we are getting new user interfaces in our cars, and that is the subject of this short post.

I drive a lot of different rental cars during any given year, so even though my own car is now eleven years old I get exposed to a lot of the new interfaces that are appearing in very standard level compact cars.

I’ve been driving the same rental car for two weeks as of today, and there are some things I really like about it. Most cars I have rented over the last two years have had a backup camera that has a live image on the LCD screen in the center of the dashboard. This is great, as it is usually a much better view than can be had by scanning all three rear view mirrors. The better versions of this feature show you an overlay of exactly where the car will go as you backup with the current setting of the steering wheel.  Here is a view from the screen in my car this morning:

This is so much better than a plain old rear view mirror. It shows me exactly what I might hit as I back up, and for me that is especially useful in a parking garage as I am just so great at hitting things while going backward…

But it is not all champagne and roses. After having this car for two weeks, twice this morning, while I was driving along the street, the following window popped up on that same screen, which is a touch screen, by the way:

How can this be a good idea, or a good User Interface feature? It could lead to a very bad User Experience! A window pops up to tell me I shouldn’t take my eyes of the road to deal with the interface. That is a good message. But not while I’m driving! Both times it made me take my eyes off the road to read what this warning was about, and then I needed to reach out and servo my finger to the “OK” virtual button to dismiss it. It was a real temptation to do it while driving. Exactly the thing it is warning against!

Bad UI. Potentially disastrous UX.

This reminded me of an interchange I saw on Facebook right after the newest Tesla came out. Someone complained about the lack of dials and knobs, and so much of the UI being put on the very big LCD that is in the middle of a Tesla dashboard. Another person chided that person, saying essentially, “get over it, we now live in the world of the iPhone, not the Blackberry”.

I thought that latter comment completely missed a real issue. The knobs and levers with their fixed positions and fixed meanings within any particular car, along with the tactile feedback that they give us allows us to do a lot of control operations without taking our eyes off the road at all. That is a very good thing until the task of driving is completely taken over by the car itself. We need our attention out on the road. Moving control functions, that are needed while in motion and while controlling the car, to a touch screen, is probably not a good idea. Being able to do things without using our eyes is a safety feature while driving (and while walking with an iPhone…).

[By the way, why do our turn indicator blinkers make a clicking sound? Because the original ones that were introduced in the 1950’s operated by running a current through and heating up a bi-metallic strip. As it heated up it bent until it hit a contact, hence the click, which then drained the current to the indicator lights on the left or right side of the car, allowing the bi-metatallic strip to cool down and repeat. Now cars simulate that same old clicking sound so that we know when the indicators are blinking.]

I suspect that we are all going to be guinea pigs over the next few years with auto-makers bringing out some really great new UI features, along with some real failures.

Be careful!!

 

 

Edge Cases For Self Driving Cars

rodneybrooks.com/edge-cases-for-self-driving-cars/

Perhaps through this essay I will get the bee out of my bonnet^{\big 1} that fully driverless cars are a lot further off than many techies, much of the press, and even many auto executives seem to think. They will get here and human driving will probably disappear in the lifetimes of many people reading this, but it is not going to all happen in the blink of an eye as many expect. There are lots of details to be worked out.

In my very first post on this blog I talked about the unexpected consequences of having self driving cars. In this post I want to talk about about a number of edge cases, which I think will cause it to be a very long time before we have level 4 or level 5 self driving cars wandering our streets, especially without a human in them, and even then there are going to be lots of problems.

First though, we need to re-familiarize ourselves with the generally accepted levels of autonomy that every one is excited about for our cars.

Here are the levels from the autonomous car entry in Wikipedia which attributes this particular set to the SAE (Society of Automotive Engineers):

  • Level 0: Automated system has no vehicle control, but may issue warnings.
  • Level 1: Driver must be ready to take control at any time. Automated system may include features such as Adaptive Cruise Control (ACC), Parking Assistance with automated steering, and Lane Keeping Assistance (LKA) Type II in any combination.
  • Level 2: The driver is obliged to detect objects and events and respond if the automated system fails to respond properly. The automated system executes accelerating, braking, and steering. The automated system can deactivate immediately upon takeover by the driver.
  • Level 3: Within known, limited environments (such as freeways), the driver can safely turn their attention away from driving tasks, but must still be prepared to take control when needed.
  • Level 4: The automated system can control the vehicle in all but a few environments such as severe weather. The driver must enable the automated system only when it is safe to do so. When enabled, driver attention is not required.
  • Level 5: Other than setting the destination and starting the system, no human intervention is required. The automatic system can drive to any location where it is legal to drive and make its own decision.

There are many issues with level 2 and level 3 autonomy, which might make them further off in the future than people are predicting, or perhaps even  forever impractical due to limitations on how quickly humans can go from not paying attention to taking control in difficult situations. Indeed as outlined in this Wired story many companies have decided to skip level 3 and concentrate on levels 4 and 5. The iconic Waymo (formerly Google) car has no steering wheel or other conventional automobile controls–it is born to be a level 4 or level 5 car. [This image is from Wikipedia.]

So here I am going to talk only about level 4 and level 5 autonomy, and not really make a distinction between them.  When I refer to an “autonomous car” I’ll be talking about ones with level 4 or level 5 autonomy.

I will make distinctions between cars with conventional controls so that they are capable of being driven by a human in the normal way, and cars like the Waymo one pictured above with no such controls, and I will refer to that as an unconventional car. I’ll use those two adjectives, conventional, and unconventional, for cars, and then distinguish what is necessary to make them practical in some edge case circumstances.

I will also refer to gasoline powered driverless cars versus all electric driverless cars, i.e., gasoline vs. electric.

Ride-sharing companies like Uber are putting a lot of resources into autonomous cars. This makes sense given their business model as they want to eliminate the need for drivers at all, thus saving their major remaining labor cost. They envision empty cars being summoned by a customer, driving to wherever that customer wants to be picked up, with absolutely no one in the car. Without that, having the autonomy technology doesn’t make sense to this growing segment of the transportation industry. I’ll refer to such an automobile, with no-one in it as a Carempty. In contrast, an autonomous car which has a conscious person in it, whether it is an unconventional car and they can’t actually drive it in the normal way, or whether it is a conventional car but they are not at all involved in the driving, perhaps sitting in the back seat, as Careless, as presumably that person shouldn’t have to care less about the driving other than indicating where they want to go.

So we have both an unconventional and a conventional Carempty and Careless, and perhaps they are gasoline or electric.

Many of the edge cases I will talk about here are based on the neighborhood in which I live, Cambridgeport in Cambridge, Massachusetts. It is a neighborhood of narrow one way streets, packed with parked cars on both sides of the road so that it is impossible to pass by if a car or truck stopped in the road. A few larger streets are two way, and some of them have two lanes, one in each direction, but at least one nearby two way street only has one lane–one car needs to pull over, somehow, if two cars are traveling on the opposite direction (the southern end of Hamilton Street in the block where the “The Good News Garage” of the well known NPR radio brothers “Click and Clack” is located).

HOW MUCH DRIVING CAN A NON-DRIVER DO?

In a conventional Careless a licensed human can take over the driving when necessary, unless say it is a ride sharing car, and in that case humans might be locked out of using the controls directly. For an unconventional Careless, like one of the Waymo cars pictured above, the human can not take over directly either. So a passenger in a conventional ride-sharing car, or in an unconventional car are in the same boat. But how much driving can that human do?

In both cases the human passenger needs to be able to specify the destination. For a ride-sharing service that may have been done on a smart phone app when calling for the service. But once in the car the person may want to change their mind, or demand that the car take a particular route–I certainly often do that with less experienced drivers who are clearly going a horrible way, often at the suggestion of their automated route planners. Should all this interaction be via an app? I am guessing, given the rapid improvements in voice systems, such as we see in the Amazon Echo, or the Google Home, we will all expect to be able to converse by voice with any autonomous car that we find ourselves in.

We’ll ignore for the moment a whole bunch of teenagers each yelling instructions and pranking the car. Let’s just think about a lone sensible mature person in the car trying to get somewhere.

Will all they be able to do is give the destination and some optional route advice, or will they be able to give more detailed instructions when the car is clearly screwing up, or missing some perceptual clue that the occupant can clearly recognize? The next few sections give lots of examples from my neighborhood that are going to be quite challenging for autonomous cars for many years to come, and so such advice will come in handy.

In some cases the human might be called upon to, or just wish to, give quite detailed advice to the car. What if they don’t have a driver’s license? Will the be guilty of illegally driving a car in that case? How much advice should they be allowed to give (spoiler alert, the car might need a lot in some circumstances)? And when should the car take the advice of the human? Does it need to know if the person in the car talking to it has a driver’s license?

Read on.

WHAT TO DO ABOUT A BLOCKED ROAD

In my local one-way streets the only thing to do if a car or other vehicle is stopped in the travel lane is to wait for it to move on. There is no way to get past it while it stays where it is.

The question is whether to toot the horn or not at a stopped vehicle.

Why would it be stopped? It could be a Lyft or an Uber waiting for a person to come out of their house or condominium. A little soft toot will often get cooperation and they will try to find a place a bit further up the street to pull over.  A loud toot, however, might cause some ire and they will just sit there. And if it is a regular taxi service then no amount of gentleness or harshness will do any good at all. “Screw you” is the default position.

Sometimes a car is stopped because the driver is busy texting, most usually when they are at an intersection, had to wait for some one to the cross in front of them, their attention wandered, they started reading a text, and now they are texting and have forgotten that they are in charge of an automobile. From behind one can often tell what they are up to by noticing their head inclination, even from inside the car behind. A very gentle toot will usually get them to move; they will be slightly embarrassed at their own illegal (in Massachusetts) behavior.

And sometimes it is a car stopped outside an eldercare residence building with someone helping a very frail person into or out of the car.  Any sort of toot from a stopped car behind is really quite rude in these circumstances, distressing for the elderly person being helped, and rightfully ire raising for the person taking care of that older person.

Another common case of road blockage is that a garbage truck stopped to pick up garbage. There are actual two varieties, one for trash, and one for recyclables. It is best to stop back a bit further from these trucks than from other things blocking the road, as people will be running around to the back of the truck and hoisting heavy bins into it. And there is no way to get these trucks to move faster than they already are. Unlike other trucks, they will continue to stop every few yards. So the best strategy is to follow, stop and go, until the first side street and take that, even if, as it is most likely a one-way street, it sends you off in a really inconvenient direction.

Yet a third case is a delivery truck. It might be a US Postal Service truck, or a UPS or Fedex truck, or sometimes even an Amazon branded truck. Again tooting these trucks makes absolutely no difference–often the driver is getting a signature at a house, or may be in the lobby of a large condominium complex. It is easy for a human driver to figure out that it is one of these sorts of trucks. And then the human knows that it is not so likely to stop again really soon, so staying behind this truck once it moves rather than taking the first side street is probably the right decision.

If on the other hand it is a truck from a plumbing service, say, it is worth blasting it with your horn. These guys can be shamed into moving on and finding some sort of legal parking space. If you just sit there however it could be many minutes before they will move.

A Careless automobile could ask its human occupant whether it should toot. But should it make a value judgement if the human is spontaneously demanding that it toot its horn loudly?

A Carempty automobile could just never toot, though the driver in a car behind it might start tooting it, loudly. Not tooting is going to slow down Carempties quite a bit, and texting drivers just might not care at all if they realize it is a Carempty that is showing even a little impatience. And should an autonomous car be listening for toots from a car behind it, and change its behavior based on what it hears? We expect humans to do so. But are the near future autonomous cars going to be so perfect already that they should take no external advice?

Now if Carempties get toot happy, at least in my neighborhood that will annoy the residents having tooting cars outside their houses at a much higher level than at the moment, and they might start to annoy the human drivers in the neighborhood.

The point here is that there is a whole lot of perceptual situations that a an autonomous vehicle will need to recognize if it is to be anything more than a clumsy moronic driver (an evaluation us locals often make of each other in my neighborhood…). As a class, autonomous vehicles will not want to get such a reputation, as the humans will soon discriminate against them in ways subtle and not so subtle. 

Maps DOn’T Tell the Whole Story

Recently I pulled out of the my garage and turned right onto the one way street that runs past my condominium building, and headed to the end of my single block street, expecting to turn right at a “T” junction onto another one way street. But when I got there, just to the right of the intersection the street was blocked by street construction, cordoned off, and with a small orange sign a foot or so off the ground saying “No Entry”.

The only truly legal choice for me to make was to stop. To go back from where I had come I needed to travel the wrong way on my street, facing either backwards or forwards, and either stopping at my garage, or continuing all the way to the street at the start of my street. Or I could turn left and go the wrong way on the street I had wanted to turn right onto, and after a block turn off onto a side street going in a legal direction.

A Careless might inform its human occupant of the quandry and ask for advice on what to do. That person might be able to do any of the social interactions needed should the Careless meet another car coming in the legal direction under either of these options.

But Carempty will need some extra smarts for this case.  Either hordes of empty cars eventually pile up at this intersection or each one will need to decide to break the law and go the wrong way down one of the two one way streets–that is what I had to do that morning.

The maps that a Carempty has won’t help it a whole lot in this case, beyond letting it know the minimum distance it is going to have to be in a transgressive state.

Hmmm.  It is OK for a Carempty to break the law when it decides it has to? Is it OK for a Careless to break the law when its human occupant tells it to? In the situation I found myself in above, I would certainly have expected my Careless to obey me and go the wrong way down a one way street. But perhaps the Careless shouldn’t do that if it knows that it is transporting a dementia patient.

The Police

How are the police supposed to interact with a Carempty?

While we have both driverful and driverless cars on our roads I think the police are going to assume that as with driverful cars they can interact with them by waving them through an intersection perhaps through a red light, stopping them with a hand signal at a green light, or just to allow someone to cross the road.

But besides being able to understand what an external human hand signaling them is trying to convey, autonomous cars probably should try to certify in some sense whether the person that is giving them those signals is supposed to be doing so with authority, with politeness, or with malice. Certainly police should be obeyed, and police should expect that they will be. So the car needs to recognize when someone is a police officer, no matter what additional weather gear they might be wearing. Likewise they should recognize and obey school crossing monitors. And road construction workers. And pedestrians giving them a break and letting them pass ahead of them. But should they obey all humans at all times? And what if in a Careless situation their human occupant tells them to ignore the taunting teenager?

Sometimes a police officer might direct a car to do something otherwise considered illegal, like drive up on to a sidewalk to get around some road obstacle. In that case a Carempty probably should do it. But if it is just the delivery driver whose truck is blocking the road wanting to get the Carempty to stop tooting at them, then probably the car should not obey, as then it could be in trouble with the actual police. That is a lot of situational awareness for a car to have to have.

Things get more complicated when it is the police and the car is doing something wrong, or there is an extraordinary circumstance which the car has no way of understanding.

In the previous section we just established that autonomous cars will sometimes need to break the law. So police might need to interact with law breaking autonomous cars.

One view of the possible conundrum is this cartoon from the New Yorker. There are two instantly recognizable Waymo style self driving cars, with no steering wheels or other controls, one a police car that has just pulled over the other car. They both had people in them, and the cop is asking the guy in the car that has just been pulled over, “Does your car have any idea why my car pulled it over?”.

If an autonomous car fails to see a temporary local speed sign and gets caught in a speed trap, how is it to be pulled over? Does it need to understand flashing blue lights and a siren, and does it do the pull to the side in a way that we have all done, only to be relieved when we realize that we were not the actual target?

And getting back to when I had to decide to go the wrong way down a one way street, what if a whole bunch of Carempties have accumulated at that intersection and a police officer is dispatched to clear them out? For driverful cars a police officee might give a series of instructions and point out in just a few seconds who goes first, who goes second, third, etc. That is a subtle elongated set of gestures that I am pretty sure no deep learning network has any hope at the moment of intpreting, of fully understanding the range of possibilities that a police officer might choose to use.

Or will it be the case that the police need to learn a whole new gesture language to deal with driverless cars? And will all makes all understand the same language?

Or will we first need to develop a communication system that all police officers will have access to and which all autonomous cars will understand so that police can interact with autonomous cars? Who will pay for the training? How long will that take, and what sort of legislation (in how many jurisdictions) will be required?

Getting Towed

A lot of cars get towed in Cambridge. Most streets get cleaned on a regular schedule (different sides of the same street on different days), and if your car is parked there at 7am you will get towed–see the sign in the left image. And during snow emergencies, or without the right sticker/permit you might get towed at any time. And then there are pop-up no parking signs, partially hand written, that are issued by the city on request for places for moving vans, etc. Will our autonomous cars be able to read these? Will they be fooled by fake signs that residents put up to keep pesky autonomous cars from taking up a parking spot right outside their house?

If an unconventional Carempty is parked on the street, one assumes that it might at any time start up upon being summoned by its owner, or if it is a ride-share car when its services are needed. So now imagine that you are the tow truck operator and you are supposed to be towing such a car. Can you be sure it won’t try driving away as you are crawling under it connect the chains, etc., to tow it?  If a human runs out to move their car at the last minute you can see when things are going to start and adjust. How will it work with fully autonomous cars?

And what about a Carempty that has a serious breakdown, perhaps in its driving system, and it just sits there and can no longer safely move itself. That will need to be towed most likely. Can the tow truck operator have some way to guarantee that it is shut down and will not jump back to life, especially when the owner has not been contactable, to put it in safe mode remotely? What will be the protocols and regulations around this?

And then if the car is towed, and I know this from experience, it is going to be in a muddy lot full of enormous potholes in some nearby town, with no marked parking areas or driving lanes. The cars will have been dumped at all angles, higgledy-piggledy. And the lot is certainly not going have its instantaneous layout mapped by one of the mapping companies, providing the maps that autonomous cars rely on for navigation. To retrieve such a car a human is likely going to have to go do it (and pay before getting it out), but if it is an unconventional car it is certainly going to require some one in it to talk it through getting out of there without angering the lot owner (and again from experience, that is a really easy thing to do–anger the lot owner). Yes, in some distant future tow lots in Massachusetts will be clean, and flat with no potholes deeper than six inches, and with electronic payment systems, and all will be wonderful for our autonomous cars to find their way out.

Don’t hold your breath.

OTHER TRICKY SITUATIONS

What happens when a Carempty is involved in an accident? We know that many car companies are hoping that their cars will never be involved in an accident, but humans are dumb enough that as long as there are both human drivers and autonomous cars on the same streets, sometimes a human is going to drive right into an autonomous car.

Autonomous cars will need to recognize such a situation and go through some protocol. There is a ritual when a fender bender happens between two driverful cars. Both drivers stop and get out of their cars, perhaps blocking traffic (see above) and go through a process of exchanging insurance information. If one of the cars is an autonomous vehicle the the human driver can take a photo on their phone (technology to the rescue!) of the autonomous car’s license plate. But how is a Carempty supposed to find out who hit it? In the distant future when all the automobile stock on the road have transponders (like current airplanes) that will be relatively easy (though we will need to work through horrendous privacy issues to get there), but for the foreseeable future this is going to be something of a problem.

And what about refueling? If a ride-sharing car is gasoline powered and out giving rides all day, how does it get refueled? Does it need to go back to its home base to have a human from its company put in more gasoline? Or will we expect to have auto refueling stations around our cities? The same problem will be there even if we quickly pass beyond gasoline powered cars. Electric Carempties will still need to recharge–will we need to replace all the electric car recharging stations that are starting to pop up with ones that require no human intervention?

Autonomous cars are likely to require lots of infrastructure changes that we are just not quite ready for yet.

Impacts on the Future of Autonomous Cars

I have exposed a whole bunch of quandaries here for both Carempties and Carelesses. None rise to the moral level of the so called trolley problem (do I kill the one nun or seven robbers?) but unlike the trolley problem variants of these edge cases are very likely to arise, at least in my neighborhood. There will be many other edge case conundrums in the thousands, perhaps millions, of unique neighborhoods around the world.

One could try to have some general purpose principles that cars could reason from in any circumstances, perhaps like Asimov’s Three Laws^{\big 2}, and perhaps tune the principles to the prevailing local wisdom on what is appropriate or not. In any case there will need to be a lot of codifying of what is required of autonomous cars in the form of new traffic laws and regulations. It will take a lot of trial and error and time to get these laws right.

Even with an appropriate set of guiding principles there are going to be a lot of perceptual challenges for both Carempties and Carelesses that are way beyond those that current developers have solved with deep learning networks, and perhaps a lot more automated reasoning that any AI systems have so far been expected to demonstrate.

I suspect that to get this right we will end up wanting  our cars to be as intelligent as a human, in order to handle all the edge cases appropriately.

And then they might not like the wage levels that ride-sharing companies will be willing to pay them.



^{\big 1}But maybe not.  I may have one more essay on how driverless cars are going to cause major infrastructure changes in our cities, just as the original driverful cars did. These changes will be brought on by the need for geofencing–something that I think proponents are underestimating in importance.

^{\big 2}Recall that Isaac Asimov used these laws as a plot device for his science fiction stories, by laying out situations where these seemingly simple and straightforward laws led to logical fallacies that the story proponents, be they robot or human, had to find a way through.

Is War Now Post Kinetic?

rodneybrooks.com/is-war-now-post-kinetic/

When the world around us changes, often due to technology, we need to change how we interact with it, or we will not do well.

Kodak was well aware of the digital photography tsunami it faced but was not able to transform itself from a film photography company until too late, and is no more. On the other hand, Pitney Bowes started its transformation early from a provider of mail stamping machines to an eCommerce solutions company and remains in the S&P 500.

Governments and politicians are not immune from the challenges that technological change produces on the ground, and former policies and vote getting proclamations may lag current realities^{\big 1}.

I do wonder if war is transforming itself around us to being fought in a non-kinetic way, and which nations are aware of that, and how that will change the world going forward. And, importantly for the United States, what does that say about what its Federal budget priorities should be?

A Brief History of Kinetic War

The technology of war has always been about delivering more kinetic energy, faster, more accurately and with more remote standoff from the recipient of the energy, first to human bodies, and then to infrastructure and supply chains.

New technologies caused changes in tactics and strategies, and many of them eventually made old technologies obsolete, but often a new technology would co-exist with one that it would eventually supplant for long periods, even centuries.

One imagines that the earliest weapons used in conflicts between groups of people were clubs and axes of various sorts. These early wars were fought in close proximity, delivering kinetic blows directly to another’s body.

By about 4,400 years ago the first copper daggers appeared, and by 3,600 years ago, bronze swords appeared, allowing for an attack at a slightly longer distance, perhaps out of direct reach of the victim. Even today our infantries are equipped with bayonets on the ends of guns to deliver direct kinetic violence to another’s body through the use of human muscles. With daggers and swords the kinetic blows could be much more deadly as they needed less human energy to cause bleeding.

Simultaneously the first “stand off” weapons were developed; bows and arrows 12,000 years ago, most likely with a very limited range. The Egyptians had bows with a range of 100 meters a little less than 4,000 years ago. A bow stores the energy from human muscle in a single drawing motion, and then delivers it all in a fraction of a second. These weapons did not eliminate hand to hand combat, but they did allow engagement from a distance. With the introduction of horses and later chariots, there was added the element of speed of closing from too far away to engage to being in engagement range very quickly. These developments were all aimed at getting bleed-producing kinetic impacts on humans from a distance.

A little less than 3,000 years ago war saw a new way to use kinetic energy; thermally. No longer was it just the energy of human muscles that rained down on the enemy, but that from fire. First from burning crops, but soon by delivering  burning objects via catapults and other throwing devices. Those throwing devices started out just delivering heavy weights, though the muscle energy of many people stored over many minutes of effort. But once burning objects were being thrown they could deliver the thermal energy stored in the projectile, as well as unleash more thermal energy by setting things on fire in the landing area.

During the 8th to 16th century, hurled anti-personnel weapons, those aimed at individual people, were developed where projectiles full of hot pitch, oil, or resin, were thrown by mechanical devices, again with stored human energy, intended to maim and disable an individual human that they might hit.

The arrival of chemical explosives ultimately changed most things about warfare, but there was a surprisingly long coexistence with older weapons. The earliest form of gunpowder was developed in 9th century China, and it reached Europe courtesy of the Mongols in 1241. The cannon, which provided a way of harnessing that explosive power to deliver high amounts of kinetic energy in the form of metal or stone balls provided both more distant standoff and more destructive kinetics, and was well developed by the 14th century, with the first man portable versions coming of age in the 15th century.

But meanwhile the bow and arrow made a come back, with the English longbow, traditionally made from yew (and prompting a European wide trade network in that wood), having a range of 300 meters in the 14th and 15th centuries. It was contemporary with the cannon, but the agility of it being carried by a single bowman led to it being the major reason for victory in a large scale battle as late as the Battle of Agincourt in 1415.

The cannon changed the nature of naval warfare, and naval warfare itself was about logistics and supply lines, and later being a mobile platform to pound installations on the coast from the safety of the sea. Ships also changed over time due to new technologies for their propulsion, from oars, to sails, to steam, and ultimately to nuclear power, making them faster and more reliable. Meanwhile the mobile cannon was developed into more useful sorts of weapons, and with the invention of bullets (which combined the powder and projectile into a compact pre-manufactured expendable device), guns and then machine guns became the preferred weapon of the ground soldier.

Each of these technological developments improved upon the delivery of kinetic energy to the enemy, over time, in fits and starts making that delivery faster, more accurate, more energetic, and with more distant standoff.

Rarely were the new technologies adopted quickly and universally, but over time they often made older technologies completely obsolete. One wonders how quickly people noticed the new technologies, how they were going to change war completely, and how they responded to those changes.

Latter Day WAR

In the last one hundred or so years, from the beginning of the Great War, also known as World War I, we have seen continued technological change in how kinetic energy is delivered during conflict. In the Great War we saw both the introduction of airplanes, originally as intelligence gathering machine conveyances, but later as deliverers of bullets and bombs, and the introduction of tanks. Even with mechanization, the United Army still had twelve horse regiments, each of 790 horses, at the beginning of World War II. They were no match for tanks, and hard to integrate with tank units, so eventually they were abolished.

By the end of World War II we had seen both the deployment of missiles (the V1 and V2 by Germany), and nuclear weapons (by the United States). Later married together, nuclear tipped missiles became the defining, but unused, technology that redefined the nature of war between superpowers. Largely that notion is obsolete, but North Korea, a small poor country, is actively flirting with it again these very days.

Another innovation in World War II, practiced by both sides, was massive direct kinetic hits on the civilian populations of the enemy, delivered through the air. For the first time kinetic energy could be delivered far inside territory still held by the enemy, and damage to infrastructure and morale could be wrought without the need to invade on the ground. Kinetically destroying large numbers of civilians was also part of the logic of MAD, or Mutually Assured Destruction, of the United States and the USSR pointing massive numbers of nuclear tipped missiles at each other during the cold war.

Essentially now war is either local engagements between smaller countries, or asymmetric battles between large powers and smaller countries or non-state actors. The dominant approach for the United States is to launch massive ship and air based volleys of Tomahawk Cruise Missiles, with conventional kinetic war heads, to degrade the war fighting infrastructure in the target territory, and then boots on the ground. The other side deploys harassing explosives both as booby traps, and to target both the enemy and local civilians through using human suicide bombers as a stand off mechanism for those directing the fight. As part of this asymmetry the non-state actors continually look for new ways to deliver kinetic explosions on board civilian aircraft which has had the effect of making air travel worldwide more and more unpleasant for the last 16 years.

In slow motion each class of combatant changes their behavior to respond to new, and past, technologies deployed or threatened by the other side.

But over the whole history of war, rulers and governments have had to face the issue of what war to prepare for and where to place their resources. When should a country stop concentrating on sources of yew and instead invest more heavily in portable cannons? When should a country give up on supporting regiments of horses? When should a country turn away from the ruinous expense of yet higher performance fighter planes whose performance is only needed to engage other fighter planes and instead invest more heavily in cruise missiles and drones with targeted kinetic capabilities?

How should a country balance its portfolio of spending on the old technologies of war, and putting enough muscle behind the new technologies so that it can ride up the curve of the new technology, defending against it adequately, and perhaps deploying it itself.

BUT HAS A NEW FORM OF WAR ARRIVED?

In the late nineteenth century fortunes were made in chemistry for materials and explosives. In the early part of the twentieth century extraordinary wealth for a few individuals came from coal, oil, automobiles, and airplanes. In the last thirty years that extraordinary wealth has come to the masters of information technology through companies such as Microsoft, Apple, Oracle, Google, and Facebook. Information technology is the cutting edge. And so, based on history, one should expect that technology to be where warfare will change.

Indeed, we saw in WW II the importance of cryptography and the breaking of cryptography, and the machines built at Bletchley Park in service of that gave rise to digital computers.

In the last few years we have seen how our information infrastructure has been attacked again and again for criminal reasons, with great amounts of real money being stolen, solely in cyberspace. Pacifists^{\big 2} might say that war is just crime on an international scale, so one should expect that technologies that start out as part of criminal enterprises will be adopted for purposes of war.

We have seen over the last half dozen years how non-state actors have used social media on the Internet to recruit young fighters from across the world to come and partake in their kinetic wars where those recruiters reside, or to wage kinetic violence inside countries far removed physically from where the recruiters reside. The Internet has been a wonderful new stand off tool, allowing distant ring-masters to burrow in to distant homelands and detonate kinetic weapons constructed locally by people the ring-masters have never met in person. This has been an unexpected and frightening evolution of kinetic warfare.

In the early parts of this decade a malicious computer worm named Stuxnet, most probably developed by the US and Israel, was deployed widely though the Internet. It infected Microsoft operating systems, and sniffed out whether they were talking to Siemens PLCs (Programmable Logic Controllers), and whether they were controlling nuclear centrifuges. Then it slowly degraded those centrifuges while simulating reports that said all was well with them. It is believed that this attack destroyed one fifth of Iran’s centrifuges. Here a completely cyber attack, with standoff all the way back to an office PC, was able to introduce a kinetic (slow though it may have been) attack in the core of an adversary’s secret facilities. And it was aimed at the production of the ultimate kinetic weapon, nuclear bombs. War is indeed evolving rapidly.

But now in the 2016 US presidential election, and again in the 2017 French presidential election we have seen, and all the details are not yet out, a glimpse of a future warfare where kinetic warfare is not used at all. Nevertheless it has been acts of war. US intelligence services announced in 2016 that there had been Russian interference in the US election.  The whole story is still to come out, but in both the US and French elections there were massive dumps of cyber-stolen internal emails from one candidate’s organization, timed exquisitely in both cases down to just a few minutes’ window of maximum impact. This was immediately, minutes later, followed by seemingly unrelated thousands of people looking through those emails claiming clues to often ridiculous malevolence. In both elections the mail dumps included faked emails which had sinister interpretations, uncovered by the armies of people looking through the emails for a smoking gun. These attacks most probably changed the outcome of the US election, but failed in France. This is post kinetic war waged in a murky world where the citizens of the attacked country can never know what to believe.

Let us be clear about the cleverness and monumental nature of these attacks. An adversary stands off, thousands of miles away, with no physical intrusion, and changes the government of its target to be more sympathetic to it than the people of the target country wanted. There are no kinetic weapons. There are layers of deception and layers of deniability. The political system of the attacked country has no way to counteract the outcome desired and produced by the enemy. The target country is dominated by the attacking adversary. That is a successful post kinetic war.

Technology changes how others act and how we need to act. Perhaps the second amendment to the US Constitution, allowing for an armed civilian militia to fight those who would destroy our Republic, is truly obsolete. Perhaps the real need is to equip the general population of the United States with tools of privacy and cyber security, both at a personal level, and in the organizations where they work. Just as WW II showed the obsolescence of physical borders to protect against kinetic devices raining from the sky, so too now we have seen that physical borders no longer protect our fundamental institutions of civil society and of democracy.

We need to learn how to protect ourselves in a new era of post kinetic war.

We see a proposed 2018 US Federal budget building up the weapons of kinetic war way beyond their current levels. Kinetic war will continue to be something we must protect against–it will remain an avenue of attack for a long time. We saw above how the English long bow was still a credible weapon, coexisting with cannon and other uses of gun powder for centuries, though now its utility is well gone.

However, we must not give up worrying about kinetic war, but we must start investing in strength and protection against a new sort of post kinetic war that has really only started in the last twelve months. With $639B slated for defense in the proposed 2018 budget, and even $2.6B for a border fence, surely we can spend a few little billions, maybe even just one or two, on figuring out how to protect the general population from this newly experienced form of post kinetic war. I have recommendations^{\big 3}.

We don’t want the United States to have its own Kodak moment.



^{\big 1}For instance, in just six months from this last October to April, more jobs were lost in retail in the US than the total number of US coal jobs. Not only did natural gas, wind, and solar technology decimate coal mining, jobs never to return, but information technology has enabled fulfillment centers, online ordering, and delivery to the home, completely decimating the US retail sector, a sector that is many times bigger than coal.

^{\big 2}I do not count myself as a pacifist.

^{\big 3}Where in the Federal Government should such money be spent? The NSA (National Security Agency) has perhaps the most sophisticated group of computer scientists and mathematicians working on algorithms to wage and protect against cyber war. But it is not an agency that shares that protection with the general population and businesses, just as the US Army does not protect individual citizens or even recommend how they should protect themselves. No, the agency that does this is NIST, the National Institute of Standards and Technology, part of the Department of Commerce.  It provides metrology standards which enable businesses to have a standard connection to the SI units of measurement.  But it also has (with four Nobel prizes under its belt) advanced fundamental physics so that we can measure time accurately (and hence have working GPS), it has been a key contributor, through its measurements of radio wave propagation. to the 3G, 4G, and coming 5G standards for our smart phones, and it is contributing more and more to biological measurements necessary for modern drug making.  But for the purpose of this note its role in cybersecurity is omni important. NIST has provided a Cybersecurity Framework for businesses, now followed by half of US companies, giving them a set of tools and assessments to know whether they are making their IT operations secure. And, NIST is now the standards generator and certifier for cryptography methods.  The current Federal budget proposal makes big cuts to NIST’s budget (in the past its total budget has been around $1B per year).  Full disclosure: I am a member of NIST’s Visiting Committee on Advanced Technology (VCAT). That means I see it up close. It is vitally important to the US and to our future. Now is not the time to cut its budget but to support it as we find our way in our future of war that is post kinetic.

Gas Mileage, with a Side of British Units

rodneybrooks.com/gas-mileage-with-a-side-of-british-units/

In the United States we measure how much gasoline an automobile uses in units of “miles per gallon”, often referred to as the car’s “fuel economy”. Elsewhere in the world it is measured in “liters (litres) per 100 kilometers”.

Since both gallons and liters are volume measurements when we do a dimensional analysis of these quantities, we get for the United States L/(L^3) or L^{-2}, and for the rest of the world L^3/L or L^2. In both cases it comes out as an area measurement, inverted in the case of the United States.

For 25mpg in the US (which is 9.4 liters per 100 kilometers), we can calculate the area knowing that a US gallon is defined to be 231 cubic inches, and since a mile is 5,280 feet, or 63,360 inches, the area (un-inverted) in square inches is: 

    \[\frac{231}{25\times 63360} = 0.000145833\]

 or a square 0.0121 inches on a side or a circle that is 0.01363 inches in diameter. In metric units, given that a liter is 1,000 cubic centimeters, or 1,000,000 cubic millimeters, and a kilometer is 1,000,000 millimeters, the area is 0.094 square millimeters, which is a square 0.3066 millimeters on a side, or a circle 0.3460 millimeters in diameter.

So that’s the area. But what does it mean physically?  It is the cross section of the volume of gasoline that it takes to drive 25 miles, or in the other units 100 kilometers, stretched out to that length. If we form it into a very long cylinder then the area is the cross section of the cylinder. So a 25mpg automobile has the contents of its gas tank stretched out over the whole length of its journey, into a cylinder of gasoline with diameter 0.346 millimeters, and the car is precisely eating that cylinder as it drives along!

Of course, having grown up in Australia back when we used Imperial British units for everything, I have always preferred expressing gas mileage in acres^{\big 1}. And 25mpg turns out to be 2.325 \times 10^{-11} acres.

A Boeing 747 burns about 5 gallons of fuel per mile, or 12 liters per kilometer, so it is eating up a cylinder with a cross sectional area of 12 square millimeters, which is a cylinder with a 3.9 millimeter diameter, roughly 100 times more than an automobile.

The first stage (S1-C) of the Saturn V moon rockets burned out at about 61 kilometers up, having consumed 770,000 liters of RP-1 kerosene. That means it consumed a cylinder of fuel with 12,623 square millimeters, i.e., a diameter of 126.8 millimeters, or just about exactly five inches.  Now that is a gas guzzler!



^{\big 1}What is an acre? It is derived from the amount of land tillable by a yoke of oxen in one day–and a long strip of land is more efficient to till than a more boxy area as you have to change direction less often. So an acre is defined as one “chain” wide, by 10 chains long. And a chain?  It is a 100 links, or exactly 22 yards (and also exactly four “rods” long, each of which is sixteen and a half feet long). So the standard tillable plot was 22 yards wide, and 220 yards long, which happens to be one eighth of a mile long, otherwise known to horse racing enthusiasts as a furlong, or “furrowlong”!

An acre plot was one eighth of a mile long and one eightieth of a mile wide, which is why there are 640 acres in a square mile. Of course once an acre is an area it can be any shape, and it is 4,840 square yards, or 43,560 square feet, which itself is precisely 99% of 44,000 square feet. Since a square rod is known as a perch, an acre is 160 perches. And BTW, the playing area of a standard US football field is roughly 0.9 acres.

Don’t even get me started on Imperial British units for weight, including a hundredweight (which is 112 pounds, of course), one 20th of an Imperial ton (2,240 pounds), and itself four quarters (28 pounds each), or 8 stone (14 pounds each). No, I won’t get started… and certainly not on money made up of pounds, shillings, and pence, with 20 shillings to a pound, and 21 shillings to a guinea for fancy stores, with 12 pence to a shilling, and a half crown was two shillings and six pence, or 2/6 (“two and six”, or 2s 6d). I won’t get started there, either…

Patrick Winston Explains Deep Learning

rodneybrooks.com/patrick-winston-explains-deep-learning/

Patrick Winston is one of the greatest teachers at M.I.T., and for 27 years was Director of the Artificial Intelligence Laboratory (which later became part of CSAIL).

Patrick teaches 6.034, the undergraduate introduction to AI at M.I.T. and a recent set of his lectures is available as videos.

I want to point people to lectures 12a and 12b (linked individually below). In these two lectures he goes from zero to a full explanation of deep learning, how it works, how nets are trained, what are the interesting problems, what are the limitations, and what were the key breakthrough ideas that took 25 years of hard thinking by the inventors of deep learning to discover.

The only prerequisite is understanding differential calculus. These lectures are fantastic. They really get at the key technical ideas in a very understandable way. The biggest network analyzed in lecture 12a only has two neurons, and the biggest one drawn only has four neurons. But don’t be disturbed. He is laying the groundwork for 12b, where he explains how deep learning works, shows simulations, and shows results.

This is teaching at its best. Listen to every sentence. They all build the understanding.

I just wish all the people not in AI who talk at length about AI and the future in the press had this level of technical understanding of what they are talking about. Spend two hours on these lectures and you will have that understanding.

At YouTube, 12a Neural Nets, and 12b Deep Neural Nets.

Robot Is A Hijacked Word

rodneybrooks.com/robot-is-a-hijacked-word/

The word “robot” has been hijacked. Twice. (Or thrice, if we want to be pedantic, but I won’t be.)

THE ORIGINAL SPIN

The word “robot” was introduced into the English language by the play R.U.R., written in Czech by Karel Capek, and first performed in Prague on January 25, 1921. R.U.R. stands for Rossumovi Univerzálni Roboti, though even in the first edition of the play, according to the play’s Wikipedia page, published by Aventium in Prague in 1920, the cover designed by Karel’s brother Josef Capek, had the English version as the title, Rossum’s Universal Robots, even though the play within was in Czech.

According to Science Friday, the word robot comes from an old Church Slavonic word, robota, meaning “servitude”, “forced labor”, or “drudgery”. And the more you look the more references indicate that it is not known whether Karel or Josef suggested the word.

But, in any case, in the play the robots were not electro-mechanical devices, in the way I have used the word robot all my life, in agreement with encyclopedias and Wikipedia. Instead they were “living flesh and blood creatures”, made from an artificial protoplasm. They “may be mistaken for humans and can think for themselves”. Both quotations here are from the Wikipedia page about the play, linked to above. According to Science Friday they “lack nothing but a soul”.

This is the common story, more or less, about where the word robot came from.

<aside>

But, but, maybe not.  According to this report the word “robot” first appeared in English in 1839. 1839! It says that robot at that time referred not to an individual, neither machine, nor protoplasm, nor electro-mechanical , but rather to a system, a “central European system of serfdom, by which a tenant’s rent was paid in forced labour or service”. Ultimately that word came from the same Slavonic root.

So perhaps in English the word “robot” changed in meaning between 1839 and 1920. Though realistically perhaps no one who picked it up from the Capek brothers in 1920 had ever heard of it from the old 1839 meaning. And in any case it seems such a different use that I don’t think it really is a hijacking. Just as “field” was not “hijacked” in going from a field of wheat to a field of study.

I am not going to count this as a “robot” hijacking.

</aside>

In 1920 “robot” referred to humans without souls, manufactured from protoplasm. But that meaning changed quickly.

THE FIRST HIJACKING

By the time I was deciding what really interested me in life, the word “robot” had turned into meaning a machine, as given by the online English Oxford Living Dictionaries, where it defines the word as:

A machine capable of carrying out a complex series of actions automatically, especially one programmable by a computer.

Here is the cover of the January 1939 edition of Amazing Stories.

There is a robot, a machine, right on the cover, illustrating a story titled I, Robot. But for me the big news is that the author of that story is Eando Binder (a nom de plume for Earl and Otto Binder), rather than Isaac Asimov–we’ll get back to Dr. Asimov in just a minute. I have found one slightly earlier reference to mechanical robots, but it is only a passing reference. In this Wikipedia list of fictional robots there is a story titled Robots Return, by Robot Moore Williams, dated 1938. Unfortunately I do not have the text of either of the 1938 or 1939 stories, so can’t tell whether the authors assume that their readers implicitly understand to what the word “robot” refers.

However, in 1940, only twenty years after the R.U.R. play was first published with the English version of robot on its cover, Isaac Asimov published his story Strange Playfellow in Super Science Stories, an American pulp science fiction magazine (of the Binder story he said “It certainly caught my attention” and that he started work on his story two months later). Later, retitled as Robbie, the story was the first one that appeared in Asimov’s collection of stories published as the book I, Robot, on December 2, 1950.  I have a 1975 reprint of a republished version of that book from 1968. In that reprint, at the bottom of the third page, after describing a little girl, Gloria, playing with a mechanical humanoid Robbie, Asimov uses the word “robot” to refer to Robbie (note the alliteration) in a very casual way, as though the reader should know the word robot.  The word “robot” got hijacked in just 20 years, in the popular culture, or at least in the science fiction popular culture, from meaning a humanoid made of protoplasm in a Czech language play in Prague, to meaning a machine that could walk, play with, and communicate with humans. Note that 1940 was before programmable computers existed, so there was some more evolution to get to the definition involving computers as quoted above.

In the 1920’s such mechanical humanoids seem to have been referred to as “automatons”.

I have no idea what contributed to that transformation of the word “robot”, but I am eager to see any citations that might be offered in the comments section.

So now we have the first real hijacking of “robot”. But there has been a more recent one. I may not care deeply about the earlier hijacking, but I sure do care about this one. I am an old school roboticist in the sense of the definition in italics four paragraphs back. And my meaning has been hijacked!!

THE SECOND HIJACKING

In the more recent hijacking, “robot” has come to mean some sort of mindless software program, that does things that are relentless, or sometimes even cruel, though sometimes amusing and helpful. This new use is getting so bad that it is often hard to tell from a headline with the word “robot” to which form of robot the story refers. And I think it is giving electromechanical robots, my life’s work, a bad name.

I think it starts with this secondary definition which also appears with the English Oxford Living Dictionaries definition from above.

Used to refer to a person who behaves in a mechanical or unemotional manner.

And then it includes an example of usage: ‘public servants are not expected to be mindless robots’.

Now Asimov’s robots have not been mindless, and none of mine have ever been mindless (well, perhaps my insect-based robots were a little mindless, certainly not conscious in any way). But the first industrial robots introduced into a GM automobile plant in 1961 certainly were mindless. They did the same thing over and over, without sensing the world, and did not care whether or not the parts or sheet metal they were operating on was even there. And woe be a person who got in their way.  They had no idea there was someone there, and even had no idea that someone, or anyone for that matter, existed, or even could exist. They did not have computers controlling them.

I may be wrong, but I trace the hijacking of the word “robot” to two things that happened in 1994. I am guessing that there is some earlier history, but that I am just not aware of it. Ultimately it is all the fault of the World Wide Web…

Tim Berners-Lee invented the World Wide Web and put up the first Web page in 1991. Explosive growth in the Web started soon after, and by 1994 there were multiple attempts to automatically index the whole Web. Today we use Google or Bing, but 1994 was well before either of those existed. The way today’s search engines, and those of 1994, know what is out on the Web is that they work in the background building a constantly updated index. Whenever someone searches for something it is compared to the current state of the index, and that is what is actually searched right then and there–not all the Web pages spread all over the world. But the program that does search all those pages in the background is known as a Web Crawler.

By 1994 some people wanted to stop Web Crawlers from indexing their site, and a convention came about to put a file named robots.txt in the root directory of a web site. That file would never be noticed by Web browsers, but a politely written Web Crawler would read it and see if it was forbidden from indexing the site, or if it was to stay away from particular parts of the site, or how often the owner of the site thought it reasonable to be crawled. The contents of such a file follow the robots exclusion standard of which an early version was established in February 1994.

You can see a list of 302 known Web Crawlers (listed as a “Robots Database”!) currently active (I was surprised that there are so many!).  Of the 302 Web Crawlers, 29 include “robot” in their name (including “Robbie the Robot”), 18 include “bot” (including “Googlebot”), three have “robo”, and one has “robi”.  Web Crawlers, which are not physical robots, have certainly taken on “robot” as part of their identity. Web Crawlers became “robots”, I guess, for their mindless search of the Web, indexing it all, and following every link without understanding what was there.

That same year there was another innovation. There had been programs, all the way back to the sixties, that could engage in forms of back and forth typed language with humans. In 1994 they got a new name, chatterbots, or chat bots, and some of them had a more or less permanent existence on particular Web pages. Probably they attracted the suffix “bot”, as they could seem rather mindless and repetitive, again harking back to the dictionary example above of mindlessness of robots.

Now we had both Web Crawlers and programs that could converse (mostly badly) in English that had taken parts of their class names from the word “robot”. Neither were independent machines. They were just software. Robots had gone from protoplasm, to electromechanical, to purely software.  While no one is really building machines from protoplasm, some of us are building electromechanical devices that are quite useful in the world. Our robots are not just programs.

Since 1994 the situation has only gotten worse! “Robo”, “bot” and “robot” have been used for more and more sorts of programs.  In a 2011 article, Erin McKean pointed out that there were “robo” prefixes, and “bot” suffixes, and that at that time, in general, robo has a slightly more sinister meaning than bot. There was “Robocop”, definitely sinister, and there were annoying “robocalls” to our phones, “robo-trading” in stocks caused the 2010 “Flash Crash” of the markets, and “robo-signers” were people signing foreclosure documents in the mortgage crisis. Chat bots, twitter bots, etc., could be annoying, but were not sinister.

Now both sides are bad. We see malicious chat bots filling chat rooms intended for humans, we see “botnets” of zombie computers, taken over by hackers to launch massive denials of service on people or companies or governments all over the world.

Here is a list of varieties of bots, all software entities. Some good, some bad.

At some web sites, such as topbots.com, it seems to be all about “bots”, but it is hard to tell which ones are software or if any have a hardware component at all. Now “bots” seems to have become a generalized word for all aspects of A.I., deep learning, big data, and IoT. It is sucking all up before it.

The word “robot” and its components have been taken to new meanings in the last twenty or so years.

A NEW WORD FOR ELECTROMECHANICAL ROBOTS?

Here’s the bottom line. My version of the word “robot” has been hijacked. And since that is how I define myself, a guy who builds robots (according to my definition) this is of great concern. I don’t think we can ever reclaim the word robot (no more so than reclaim the word “hacker”, which used to be only pristine goodness forty years ago, and I was honored when anyone ever referred to me as a hacker, even more so as a “robot hacker”). I think the only thing to do is to replace it.

How about a new word? What should we call good old fashioned robots? GOFR perhaps?

One that comes to mind is “droid”, a shortened version of “android”, which before it was a phone software system meant a robot with a human appearance. Droid distinguishes itself from the hijacked version of android which refers to phone software, and is generally understood by people to mean an electromechanical entity, an old style robot. Star Wars is largely responsible for that general perception. But that is also the problem in trying to use the word more generally. The Star Wars franchise has the word “droid” completely bottled up and copyrighted. I know of three robot start up companies that wanted to use the word “droid”, and all three gave up in the face of legal problems in trying to do that.

No droids. Unfortunately.

So…what should the new word be? Put your suggestions in the comment section^{\big 1}, and let’s see what we can come up with!



^{\big 1}I manually filter all comments, as the majority of comments posted are actually advertisements for male erectile dysfunction drugs…

Megatrend: The Demographic Inversion

rodneybrooks.com/megatrend-the-demographic-inversion/

Megatrends are inexorable, at least for some decades, and drive major changes in our world. They may change our planet permanently, they change the flow of money within our society, they drive people to move where they live, they kill some people and cause others to be born, they change the fortunes of individuals, and they change the shape of our technology. When a megatrend is in full swing it is quite visible for all to see, and with a little thought we can make predictions about many of the things it will cause to change in our world. Having accurate predictions in hand empowers individuals to make canny decisions.

A few of today’s megatrends are global warming, urbanization, increased use and abuse of antibiotics, the spread of invasive species, increased plastic in the seas, rapid species extinctions, and changing demographics of the world’s human population. We see all these happening, but we can feel powerless to change them, and certainly they are impossible to change in the short term.

In this post I will talk about the demographic inversion that is happening in many parts of the world, and how that will drive much of our technology, and in particular our robotics technology, for the next thirty to fifty years.

We know how many 25 to 29 year olds there are in China today, and we know an upper bound on how many there will be in 20 years–less than two thirds of the number today. No political directives, government coercion, or technological break throughs are going to change that. This population trend is real and now unavoidable, and it is man and woman made. We can’t change this fact about the world twenty years hence. Given the proportion of the world’s population that is in China, and that more than half the other countries have similar trends, the aging of human society is a megatrend.

Magnitude of the inversion

Here is the data on which I based my comments above, in a diagram I got from the CIA World Factbook, yes, that CIA.

This is a pretty standard format for showing population distributions. Male and female are separated to the left and right, and then their age at the date of the data is histogrammed into five year intervals. This is a snapshot of the age distribution of the Chinese population in 2016. One can see the impact on the birth rate of the hardships of the cultural revolution, followed by a population boom a dozen years later. And then we see the impact of the one child policy as it was enforced more and more strongly following its introduction in 1979, with an uptick in the echo of the earlier population boom. The one child policy was phased out in 2015, but there is some evidence that the culture has been changed enough that one child couples might continue at a higher rate than in many other countries with equally strong economies. We also see here the impact of the cultural desire for male children over female children when there is a restriction to only one child. Not all the extra female children were necessarily subject to abortion or infanticide however, as it is strongly believed that there is a large ghost population of female children in China, girls who existence is hidden from authorities.

Here is the same data for Japan, and now we see a truly scary trend.

There are just less and less younger people in Japan in an unbroken trend lasting for forty years. Given the fertility age profile for women, forty years of decrease is really hard to turn around to an upward trend without immigration which Japan very much eschews.

The real impact is on age distribution. Again, from the CIA World Factbook, currently 27.28% of the Japanese population is 65 or older, compared with only 15.25% of the US population, or 6.93% of the Mexican population. In Japan in 2016 there were for every 1,000 people only 7.8 births while there were 9.6 deaths. Looked at another way, the average number of births per woman in Japan is now only 1.41, compared to the obvious number of 2.0 needed for population replacement, but more like 2.1 to cover early deaths. While the population of Japan is shrinking, the ratio of older people to younger people is going to get larger and larger.  There is detailed coverage of the aging of Japan in Wikipedia, and here is a critical graph (under Creative Commons License) from that article:

By 2050 predictions are that over 35% of the population of Japan will be 65 or older. All those people are already in existence. Only a truly stupendously radical change in birth rate can possible change the percentage. There are no indications that such a change is coming. Here is another way of looking at how the population is changing, this one from 2005 from the Statistics Bureau of the Japan Ministry of Health, Labor, and Welfare.

This is the typical, if somewhat more extreme in this case, change of shape of the population of developed countries in the world. Population is changing from being bottom heavy in age to top heavy. This is true of Europe, North America, Japan, Korea, China, and much of South America.

The current population of the world is about 7.5 billion people.  Predicting how that number will change over the next century still has a lot of uncertainty, largely driven by uncertainty over whether the fertility rate (how many children each woman has on average) will drop as quickly in sub-saharan Africa as it has in the rest of the world whenever the standard of living increases. But the growth in the world population does seem to be slowing down. On average, then, the pattern of an increasing ratio of older to younger people that we have seen above for China and Japan will be the overall pattern for the world over the next thirty to forty years. India is lagging Japan in this regard by about 50 years, so it will soon start happening there too, even as India becomes the most populous country in the world, surpassing China.

Consequences of the inversion

Thirty years ago in much of the developed world we saw schools built to handle the post war baby boomers close after that bubble of children worked its way through the system. Today we are starting to see the consequences of the demographic inversion. It is showing up in two ways: (1) less young people filling certain job categories, creating a pull on automation in those arenas, and (2) uncertainty about how care can be provided for the huge overhang of elderly and very old people we are about to see.

(1) Less workers. Some jobs that seemed attractive in earlier times no longer seem quite as attractive as younger people have more educational opportunities, and aspirations for more interesting jobs. Two categories for which this is particularly true are farming and factory work. Farming has the additional negative aspect of often requiring people to live away from particular urban areas that they might otherwise choose with more geographically mobile professions. These jobs both require physical labor, often in unpleasant conditions. Neither is a job that many people would take up at a later stage in life.

Food supply

The average age of a Japanese farmer is now 67, and in all developed nations the average age is 60.  Agriculture ministers from the G7 last year were worried about how this high age could lead to issues over food security. And as the world population is still increasing, the need for food also increases.

The Japanese government is increasing its support for more robots to be developed to help with farming. Japanese farms tend to be small and intensely farmed–rice paddies, often on terraced slopes, and greenhouses for vegetables. They are looking at very small robotic tractors to mechanize formerly manual processes in rice paddies and wearable devices, exoskeletons of sorts, to help elderly people, now that their strength is waning, continue to do the same lifting tasks with fruits and vegetables that they have done for a lifetime.

In the US farms tend to be larger, and for things like wheat farming a lot of large farm equipment is already roboticized.  Production versions of many large pieces of farm equipment, such as those made by John Deere^{\big 1} (see this story from the Washington Post for an example) have been capable of level 3 autonomous driving (see my blog post for a definition) for many years, and can even be used at level 4 with no one in the cab (see this 2013 YouTube video for an example).

There is now robotics research around the world for robots to help with fruits and vegetables. At robotics conferences one can see prototype machines for weeding, for precision application of herbicides and insecticides, and for picking fruits and vegetables. All these parts of farming currently require lots of labor. In the US and Europe only immigrants are willing to do this labor, and with backlashes against immigration it leaves the land owners with no choice but to look for robotic workers, despite the political rhetoric that immigrants are taking jobs that citizens want–it is just not true.

Tied into this is are completely new ways to do food production. We are starting to see more and more computer controlled indoor farming systems both in research labs in Universities and in companies, and as turn key solutions from small suppliers such as Indoor Farms of America and Cubic Farms, to name just two. The key idea is to put computation in the loop, carefully monitoring and controlling temperature, humidity, lighting, water delivery, and nutrient delivery. These solutions use tiny amounts of water compared to conventional outdoor farming. More advanced research solutions use computer vision to monitor crop growth and put that information into the controlling algorithms. So far we have not seen plays in this space from large established companies, but I have seen research experiments in the labs of major IT suppliers in both Taiwan and mainland China. We now have enough computation in the cloud to monitor every single plant that will eventually be consumed by humans. Farming still requires clouds, jut entirely different ones than historically. Indoor farms promise much more reliable sources of food than those that rely on outside weather cooperating.

Once food is grown it requires processing, and that too is labor intensive, especially for meat or fish of any sort. We are still a few years away from bionically grown meat that is practical, so in the meantime, again driven by lack of immigrants and a shortage of young workers, food processing is turning more and more to automation and robots. This includes both red meat cutting and poultry processing. These jobs are hard and unpleasant, and lead to many repetitive stress injuries. There are now many industrial robots in both the US and Australia being used to do some of these tasks. Reliance on robots will continue to grow as the population ages.

Manufacturing

Manufacturing is an area where there has been great fear of job loss due to automation.  But my own experience in manufacturing in both China and the United States over the last 20 years is that there is a shortage of manufacturing labor and that the labor force is aging.

I have been involved in manufacturing in mainland China since 1997. In the early years it was with iRobot, and the companies I worked with were all based in Taipei or Hong Kong, but had plants in Guangdong Province. Around 2004 I started to notice that we were losing a lot of workers over Golden Week, the biggest holiday period in China where most people travel in unbelievably crowded trains over incredibly long periods to go “home” to visit parents and other family. We started seeing that our production lines were suffering after Golden Week as so many workers would not return, and it might take a couple of months to make up for those losses. This sometimes had real impact on our business with a drop in deliverable product.

By around 2005 in my role as Director of CSAIL (Computer Science and Artificial Intelligence Laboratory) at M.I.T., I was working with a number of very big manufacturing companies based in Taipei, with plants spread over a much wider area of mainland China. These companies did not have their own brands then, but many now do. They built many of the IT products that we in North America use on a daily basis. They were working with M.I.T. in order to move up the value chain from being OEMs (Original Equipment Manufacturers) for US brands to having the technology and internal R&D to develop their own unique and branded products. The message I got over and over again, often from the original founders who had started the companies in the 1970’s, but were now enabling a new generation of management to take the companies forward, was that it was getting harder and harder to get sufficient labor in China. I remember one particular discussion (and I most likely don’t have the exact wording correctly here, but this is how I remember it): “in the old days we would put up a single sign, 3 inches by 5 inches, and the next morning we would have a line of prospective workers around the block–now we employ advertising agencies in Shenzhen and run ads on TV, and still we can’t get enough workers”. They told me that their two biggest problems in China were worker recruiting and worker retention. That was in 2005.

Today as I talk to manufacturers in China I find that very well run companies, with lots of retention strategies in place will have a labor turn over rate of 15% per month (per month!!). Less well run companies will have up to 30% per month. Imagine trying to run a business with that level of turnover.

The reasons for the drop in eager manufacturing labor in China are complex. The demographic charts above tell a big part of the story. But another part is that the general standard of living has risen in China, people have more access to education, and they have higher aspirations. They don’t want to work in a factory at repetitive jobs–they want more meaningful work. All humans want to do meaningful things once they are beyond desperately trying to survive.

At the same time as this, I was working as an advisor to John Deere, visiting many of their factories in the US, and seeing how they were suffering from an aging workforce with no prospects for younger replacement labor to come along, in towns in the mid-west where the young would leave for bigger cities as soon as they had a chance. It wasn’t that there were not jobs in those plants and smaller cities, but that the youth was heading for bigger cities.

Those trends that I was seeing back then have been borne out in the decade since.

In this 2013 story at Bloomberg it is reported that the median age of a highly skilled US manufacturing worker was 56. And this 2013 report from the Manufacturing Institute compares how the median age of all manufacturing workers, skilled or unskilled, is rising compared to the age of other workers.

We can see that the median age of a manufacturing worker is going up by a year every two or three years, and that it is going up faster than for other non-farm jobs. In the 12 years shown here, manufacturing workers have gone from being 1.1 years older than other workers, to 2.7 years older.

My observations of a decade ago alerted me to the fact that worldwide we would be having a shortage of labor for manufacturing. This has indeed come to pass. Naturally I thought about whether robots could make up the shortfall. But it seemed to me that industrial robots of the time were not up to the task.

While there were a lot of robots working around the world in automobile factories they existed in a complete apartheid from humans. In the body and paint shops it was all robots and no humans, and in the assembly lines themselves it was all humans and no robots. The fundamental reason for this was that industrial robots were not safe to be around. They had no sensors to detect humans and therefore no way to avoid hitting them, with tremendous force, should humans stray into their workspace.

For a set of tasks which robots could do by themselves, repeatedly and reliably, humans were banished from that part of the factory, both for the safety of the humans, and so that humans wouldn’t mess up the totally controlled order needed for robots with hardly any sensors to see changes in the world. Where there  were tasks that humans but not robots could do, the solution was to have humans do all the tasks, and to banish robots.

There were many consequences of this dichotomy. First, it meant that the installation of robots required a well thought out restructuring of a factory floor, and turned into a process that could easily take a year, and involve much peripheral equipment, such as precision feeders, to support the robots. Second, because the robots and people were segregated there was no thought put into ease of use of robots, and use of modern user interfaces–there were no human users! Third, the robots had to be 100% successful at everything they were tasked with doing, or otherwise a human would have to enter the robot domain to patch things up, and that meant that all the robots, perhaps hundreds of them, would have to be stopped from operating. Fourth, the human factory workers had no direct experience of robots, no understanding of them as tools (as they did of electric drills, say), and so the alienness of them contributed to the “us” vs. “them” narrative that the press loves to propagate.

So in 2008 I founded Rethink Robotics (née Heartland Robotics) developing smart collaborative robots to address these shortcomings of industrial robots. The idea, and reality, is that the robots are safe to work side by side with humans so there is no longer any segregation. The robots come with force sensing and built in cameras so that out of the box they are useful and require much less peripheral equipment to control the environment for them. The robots have modern user interfaces, which, like the user interface of a smart phone, teach a user how to use the system as the user explores, so that ordinary factory workers and technicians can use the robots as tools. The robots get regular software upgrades, just like our phones and computers, so more value is added to them during their lifetime. And lastly, the robots are easily able to control other equipment around them, again reducing the set up and integration time.

These sorts of robots are growing in popularity, and are leading a revolution in automation of the 90% of factories in the world that are not automobile factories. They are beginning to answer the severe labor shortage, world wide, that the demographic inversion is causing.

Fulfillment

Much of technology over the last 50 years has been used as a way of shifting service labor to the end user, and thus reduce the number of workers needed in order to provide a service. Examples of this are bank ATMs, self service gas stations, supermarkets rather than separate butcher shops, fruit and vegetable stores, and dry goods stores with all the goods behind a counter, vending machines, automated checkout lines, check-in kiosks at airports, on line travel reservation web sites, word processors, cheap printers, and electronic calendars replacing administrative assistants, etc.

One technological development that is different in this regard is fulfillment services for online shopping. No longer do we need to travel to a physical location for a particular sort of purchase, walk into the store, take physical possession of the object, take it out to our car, transport it to our house, and then unpack it. Now we choose an object, or a wide variety of objects that would previously be located at geographically separated locations, and they arrive at our homes with no more personal effort from us.

This has shifted and restructured labor for getting goods to our houses. And it is a mixture of it being a challenge to get enough workers in the chain of steps that run from online order to the goods being in the home, and the fact that this convenience will let the elderly stay in their homes longer, beyond when they are up for all the active shopping that yesteryear would have required.

The chain of events in fulfilling an order is roughly as follows:

  1. Go to many different locations in a warehouse where each of the ordered objects are stored.
  2. Pick up each object.
  3. Pack them in in a box.
  4. Move the box from the warehouse to some sort of long distance transportation node.
  5. Transport the box a long distance to a terminal node.
  6. Take the box from the terminal node and deliver it to the customer’s house

This set of steps is stereotyped and may not be exactly what happens for every order. For instance sometimes there is only one item in the order. Sometimes steps four and five might involve a combination of steps from a shipping depot for a particular carrier (e.g., FedEx or UPS), to their airport operations, then two flights to a final airport, then a truck to their distribution node. The final delivery may be via multiple steps, first to the post office and then the regular postal service, or a special postal service, or perhaps a direct truck delivery from the carrier’s shipping node.

Fulfillment companies have to hire lots of temporary labor at different times of the year to meet demand, and even at the best of times it can be hard to get labor willing to do some of these jobs. So those companies are trying to automate many of the steps, and that often involves robots.

Let’s look at Amazon, for instance.

In steps 1 and 2, above, people have to move around all over enormous warehouses in order to get the items together to fulfill an order. This step is called picking, as the person needs to pick up all the items for the order. One of the first improvements in efficiency was to have a single person be picking for multiple orders at the same time. A program would group together a set of orders so that the person could carry all the items in their cart, but chose the orders so that if a person was picking up items A and B for one order, it would find a second order where perhaps there was an item C that the person would pass by going from item A to item B. When the person got back to the packing station for step 3, it might be that they packed all the items for their orders, or it could be that a different person there would do all the packing, so that pickers and packers were doing specialized jobs.

Most of what a picker did then was to move around from place to place. That is something that robots can be made to do rather easily these days. But the actual picking up of one of maybe hundreds of thousands of different sorts of objects is something that our robots are not at all good at. See, for instance, my quick take on research needed on robot hands.

A start up company in Boston, Kiva Systems, tackled this in a brilliant way. Kiva asked its customers to store all their fulfillment items on standardized shelving modules. The pickers all had fixed workstations to one side of the warehouse. Then small flat-ish robots would go out, drive under a shelving unit, left it up, and bring it to a picker, arriving just as their hands were free ready to pick a new item. A screen would tell the person what to pick, an LED on the relevant shelf would light up, and miraculously the item would be in easy reach at just the right moment. By the time the picker had scanned the item with a bar code reader to confirm they had the right object the next shelving unit would be right there ready for their next pick.

Humans are still way better at picking than robots, but it is easy to automate moving around. The brilliance here was to change what was moving. In the old way the human picker moved from shelf to shelf. In the new way, the shelves moved around to the picker.

This approach turned out to be such an increase in efficiency that Amazon bought the company, and has since expanded it greatly into Amazon Robotics based in Boston. All new Amazon distribution centers are using this technology.

As a side effect, that left other fulfillment companies without the option of using Kiva Systems, so now there are at least five start up robotics companies in Boston attacking the same problem–it will be interesting to see when each of them comes out of stealth mode how they manage to work around the intellectual property covered by Kiva/Amazon patents.

Amazon now runs an annual “pick challenge” at a major robotics conference that moves from continent to continent on a yearly basis. They are encouraging academic researchers to compete on how well their robots (it is really their algorithms–over half the competitors use robots from my company, Rethink Robotics) are able to do the picking task.

Amazon regularly announces large hiring goals for workers in fulfillment centers but they just can not get enough. Only through automation will they be able to meet the challenges.

Amazon, and others, are also looking at step 6 from above. In Amazon’s case they are talking about using drones to deliver products to the home. Just three weeks ago I was present in Palm Springs for what turned out to be Amazon’s first public delivery by drone in North America. The delivery was to Jeff Bezos himself and the box contained many bottles of sunscreen. If only I had realized it was the first such delivery I would have taken a sunscreen bottle when Jeff offered to hand one to me from the box he had just opened!

Other companies are looking at other solutions for that last leg of delivery, and there are regularly press stories about small robots meant to autonomously drive on sidewalks to get things to houses inside lock boxes that only the recipient will be able to open. Other solutions offered involve using the trunk space of on-demand car services, and have the drivers drop the goods at people’s houses between taking paying customers on journeys.

All these step 6 solutions are in their early days, and it remains to be seen whether any of the current proposals really work out, or whether others will be needed.

But in any case there is a clear demand for these current steps, and if researchers can make progress there is plenty of room for robots to work in the actual picking and the packing, besides where current systems are being developed.

(2) Assistance for the elderly. When we look at the ratio of working age people to those sixty five and older we see remarkable changes over half a century or so. In Japan the ratio is going from about nine to one to two to one. In the US it is not quite as extreme, but still extreme. This means that there can not possibly be as large a pool of workers to provide care services for the elderly as there was in the past, and even worse, from that point of view, modern medicine is extending the lives of the elderly so that they are able to survive when they are much older and much frailer.

Well, you say, we should go back to the old way, where the families looked after the elderly. But think what that means for China with the effect of its one child policy. After only two generations of it, a youngish couple will have one child of their own, but four parents and eight grand parents to look after, with no help from siblings or cousins. A modern Chinese couple is raising one child and is solely responsible for 12 older people. Yikes!

In both China and the US, and probably other places there is the additional problem that people move vast distances for work, and so they are not geographically coupled to where their parents are.

I think this means that ultimately most people will have to end up in managed care facilities, but that there will be much smaller pools of human workers to provide services.

Care services for the elderly

Most people resist moving into a managed care facility for as long as possible. This resistance is going to be part of the solution. If care services can come to people’s homes, in whatever form, then people will be able to stay in their own homes much longer.

I think that the elderly want independence and dignity. Those are two very important words, independence and dignity.

At the same time as people get older and more frail they face many challenges. There is danger that they might fall and not be able to get up. They may forget to take their medicines. They may have trouble getting into and out of bed. They may have trouble carrying packages from someplace outside their house that a delivery system delivers it to, into their house. They may have trouble reaching high shelves that they have used all their lives. They may have trouble keeping their house clean, putting things into and out of dish washing machines and clothes washing machines. They may have trouble folding their laundry and putting it away. Ultimately they may have difficulty dressing and undressing themselves. They may have trouble getting enough exercise without the risk of breaking fragile bones. They make have trouble using the bath or shower. They may have trouble using the toilet.

The longer they can get assistance in doing all these things with independence and dignity, the longer they can stay in their own homes.

I think that this is where a lot of robotics technology is going to be applied over the next thirty years, as the baby boomers slide into older and older age.

It is a little too early for actual robots for most of these challenges at this point. We need lots more research in robotics labs. There are a few labs in the US that are looking at these problems, but already in Japan it is a priority and one sees many demonstration robotic systems at big robot conferences in Japan, where research institutes and Universities show off their early ideas on what robots might do to help the elderly with some of these challenges.

I want to stress that this is not research into robot companions for the elderly. Rather it is research into machines that the elderly will be able to use and control, machines that will give them both independence and dignity. I think we all want to stave off the day when we will need a person to wipe our bum, and various sorts of machines can preserve that dignity for longer.

It is early days yet in research on these topics. It is only just now starting to appear on people’s radars that these sorts of robots will be necessary, and that there will be an incredibly large market for them. Today’s research prototypes are too early for commercialization. But the demographic megatrend is going to put a tremendous pull on them. Before too long VCs are going to see a long line of people want to give pitches for funding for various companies to develop such robots. And large established companies in adjacent non-robotic markets are going to be faced with how to transform themselves in home elder care robotics companies.

Driver assist

In my January essay on self driving cars I referenced the levels of autonomy that are usually used to describe just how much self driving is done by the car. Levels 4 and 5 involve no human input at all, the car does indeed drive itself. I am becoming more skeptical by the day that we will have level 4 or level 5 autonomy in our cars any time soon, except in very restricted and special geographies. Level 3 which lets the driver take their hands off the wheel and their attention off of driving is also going to be harder than many people think as the switch back to the human taking over in tricky circumstances is going to be hard to pull off quickly enough when it is needed. Levels 1 and 2 however, where the person is observing what happens and takes over when they need to, will, I think, be fairly commonplace in just a few years.

Levels 1 and 2 for autonomy in cars are going to make them so much more safer, even for bad drivers. And the elderly usually get progressively worse at driving. These levels will take over parking, lane keeping, and eventually almost all of braking, accelerating, and steering, but with the driver’s hands still on the steering wheel.

The technology for all of levels 1 through 5 is really robotics technologies. As we go up through the autonomy levels the cars are progressively becoming robots more and more.

This, for now, is my final arena where the aging population is going to be a pull on new robotic technology. The longer that an elderly person can drive, the longer they can have their independence, the longer they will be able to stay in their own homes, and the longer they can get by without relying on individual services provided to them by someone younger.

Car companies already recognize the need to cater more to an elderly driving population, and many driver assist features that they are introducing will indeed extend the time that many drivers will be able to drive.

Conclusion

The demographic inversion in our population, a megatrend that is happening whether we like it or not, is becoming a significant pull on the need for new automation, IT, and robotics. It is pulling us to more robotic solution in all stages of food production, in manufacturing, in fulfillment and delivery to the home, soon in care services for the elderly, and even now in driver assist features in cars.



^{\big 1}Full disclosure.  For many years I was a member of John Deere’s Global Innovation and Technology Advisory Council.

 

What Is It Like to Be a Robot?

rodneybrooks.com/what-is-it-like-to-be-a-robot/

This is the first post in an intended series on what is the current state of Artificial Intelligence capabilities, and what we can expect in the relative short term. I will be at odds with the more outlandish claims that are circulating in the press, and amongst what I consider an alarmist group that includes people in the AI field and outside of it. In this post I start to introduce some of the key components of my future arguments, as well as show how different any AI system might be from us humans.

Some may recognize the title of this post as an homage to the 1974 paper by Thomas Nagel^{\big 1}, “What Is It Like to Be a Bat?”. Two more recent books, one from 2009 by Alexandra Horowitz^{\big 2} on dogs, and one from 2016 by Peter Godfrey-Smith^{\big 3} on octopuses also pay homage to Nagel’s paper each with a section of a chapter titled “What it is like”, and “What It’s Like”, respectively, giving affirmative responses to their own questions about what is it like to be a dog, or an octopus.

All three authors assume some level of consciousness for the subject animals.

Nagel was interested in the mind-body problem, an old, old, philosophical problem that has perplexed many people since ancient Greek times.  Our minds seem to be something very different from the physical objects in our world.  How can we understand what a mind is in physical terms (and some deny that there is a physical instantiation), and how is it that this thing so different from normal objects interacts with our body and our environment, both of which seem to be composed of objects?

Nagel says:

Without consciousness the mind-body problem would be much less interesting.  With consciousness it seems hopeless.

He then goes on to equate consciousness with there being some way of saying what it is like to have a particular mental state right now. In particular he says:

But fundamentally an organism has conscious mental states if and only if there is something that it is like to be that organism–something it is like for the organism.

He then tries to imagine what it is like to be a bat.  He chooses a bat as its perceptual world is so different and alien to our own. In particular at night they “perceive the external world primarily by sonar, or echolocation”, and “bat sonar, though clearly a form of perception, is not similar in its operation and there is no reason to suppose that it is subjectively like anything we can experience or imagine”. Nagel thereby concedes defeat in trying to imagine just what it is like to be a bat. The objective way of looking at a bat externally, does not lead to the subjective understanding that we all have of ourselves. I think that if we ever get to robots with mental states and the robots know about their mental states we will unlikely be able to imagine what it is like to be them. But I think we we can imagine some lesser aspects of being a robot, purely objectively, which will be the main point of this post.

First, however, let’s talk about the same question for dogs and octopuses very briefly, as discussed by the authors of the two books referenced above, Alexandra Horowitz, and Peter Godfrey-Smith.

Godfrey-Smith is fascinated by octopuses and their evolutionarily related cousins cuttlefish and squid. The last common ancestor that their branch of the tree of life had with us, or other mammals, or birds, or fish, or dinosaurs, was about six hundred million years ago. It was a small flat worm, just a few millimeters in size, with perhaps a couple of thousand neurons at most. After the split, our side went on to become vertebrates, i.e., animals with a spine. The other side produced the arthropods including, lobsters, ants, beetles, etc., and a side tree from them turned into the mollusks, which we mostly know as worms, slugs, snails, clams, oysters, scallops and mussels.  But a suborder of mollusks is the cephalopods which include the three animals of interest to Godfrey-Smith. His interest is based on how intelligent these animals are.

Clearly these three animals evolved their intelligence separately from the intelligence of birds, lizards, and mammals. Although we can see evolutionarily related features of our brains in brains of very different vertebrates there is no correspondence at all with the brains of these three cephalopods. They have a distributed brain where only a small portion of neurons are in a central location for the brain, rather the majority of the neurons are out in the tentacles. The tentacles appear to autonomously explore things out of sight of the creature, sensing things chemically and by touch, grasping and manipulating small objects or food samples. But if an octopus, in an experimental setting, is required to guide one of its tentacles (through a complex maze, devoid of the chemical signals a tentacle would use on its own) using the vision in its eyes it takes a very long time to do so.  At best it seems that the central brain has only supervisory control over the tentacles. They are very different from us, but as we will see later, we can relate to much of the intelligent behavior that they produce, even though on the inside they must indeed be very different. Just as our robots will be very different from us on the inside.

Horowitz studies dogs, and not only imagines what the world seems like to them, but also invokes an idea from Jacob von Uexküll^{\big 4}, a Baltic German biologist who lived from 1864 to 1944. He talked about the Umwelt of an animal, a German word for environment or milieu (literally “around world”) which is the environment in which the animal senses, exists, and acts. An external observer, us humans say, may see a very different embedding of the animal in the world than the animal itself sees. This is going to be true of our robots too. We will see them as sort of like us, anthropomorphizing them, but their sensing, action, and intelligence will make them very different, and very different in the way that they interact with the world.

I have always preferred another concept from von Uexküll, namely Merkwelt^{\big 5}, literally “note world”, which when applied to a creature refers to the world that can be sensed by that creature. It is a part of the Umwelt of a particular creature. The Umwelt of the creature includes the creature itself, the world in which it is, its own Wirkwelt (active world), the effects it can have on the world, and its Merkwelt which is its capability to sense the world. By divorcing the Merkwelt from the creature itself we can talk about it without getting (too) tied up with consciousness.

On the other hand, talking precisely about the Merkwelt of an animal is technically hard to get exactly right. We can observe the Umwelt of an animal, at least using our own Merkwelt. But to understand the Merkwelt of an animal we need to understand what its sensors tell it–and even that is not enough as some sensory values may get to some parts of the brain but not to others. We can look at the physiology of an animal and guess at what sort of things its sensors are telling it, and then set up experiments in the lab where we test the limits of what it might be sensing in those domains. But it is hard to know sometimes whether the animal does not care (in a behavioral sense if we want to leave consciousness out of the picture) about certain sensory stimuli, even if it is aware of them. Then we need to know how the creature processes that sensory data. We will see some examples in humans below, where our own intuitions about how we sense the world, our Merkwelt, are often very wrong.

For robots it is a different story. We build the robots! So we know what sensors we have put in the robot, and we can measure the limits of those sensors and what will make a digital bit flip in the output of the sensors, or not. And we can trace in the wiring and the code where those sensor values are sent throughout the whole mechanical, electrical, and computational robot. In theory we can have complete access to the Merkwelt of a robot. Something we can hardly do at all for the Merkwelt of an animal (or even, really, for ourselves). However, some of the AI techniques we might use in robots introduce new levels of uncertainty about the interpretation of the Merkwelt by the robots–they are not as transparent as we might think, and soon we will be sliding back to having to worry about the complete Umwelt.

In this post I am going to talk about the possible Merwelts (or Merkwelten) of robots, first by talking about the Merkwelts of some animals, and how different they are from our own human Merkwelt, and then extending the description to robots. And taking Nagel’s advice I am going to largely remove the notion of consciousness from this discussion, in contrast to Nagel, Horowitz, and Godfrey-Smith. But. But. Consciousness keeps slipping its way in, as it is the only way we have of thinking about thinking. And so first I will talk about consciousness a little, trying to avoid toppling over the edge to the grease covered philosophical slope that slides away unendingly for most who dare go there at all.

Some COMMENTS ON CONSCIOUSNESS in animals

What consciousness is is mysterious. Some have claimed that it might be a fundamental type in our universe, just as mass, energy and time are fundamental types. I find this highly unlikely as for all other fundamental types we see ways in which they interact. But we do not see any sign of psychokinesis events (outside of movies), where a consciousness has some impact directly on some other thing, outside of the body containing the consciousness. I remain a solid skeptic about consciousness as a fundamental type. I think it is an emergent property of the way a particular animal is put together, including its body and nervous system.

So let’s talk about who or what has it. We can only be absolutely sure that we ourselves have it. We are each sure we do, but it is hard to be sure that other humans do, let alone animals. However it seems highly unlikely that you, the reader, would be the one human being in the world with consciousness, so most likely you will grudgingly admit that I must have it too. Once we assume all humans have consciousness we are confronted with a gradualism argument if we want to believe that only humans have it.  Could it really be that humans have had consciousness and that Neanderthals had none? Humans interbed with Neanderthals and most of us carry some of their DNA today. Were they just Stepford wives and husbands to humans? And if Neanderthals had it then probably many other hominoid species had it. And if them, how about the great apes? Look at this orangutan’s reaction to a magic trick and ask could it be that this being has no sense of self, no subjective experiences, nothing at all like what us humans feel? So I think we have to admit it for orangutans. Then how far does it extend, and in what form?

Could octopuses have anything like consciousness? They evolved separately from us and have a very different brain structure.

Octopuses can be trained to do many things that vertebrate animals in laboratories can be trained to do, to solve mazes, to push levers for food, to visually recognize which of two known environments it has been placed in and respond appropriately, etc.  More interesting, perhaps, are the very many anecdotal stories of observed behaviors, often in many different laboratories around the world. Octopuses are known to squirt water from their tanks at specific people, drenching them with up to half a gallon of water in a single squirt. In some instances it has been a particular octopus regularly attacking a particular person, and in another instance, a cuttlefish attacking all new visitors to the lab. Thus octopuses react to people as individuals, unlike the way ants react to us, if at all. It seems that octopuses are able to recognize individual humans over extended time periods. But more than that, the octopuses are masters of escape, they can often climb out of their tanks, and steal food not meant for them. And in at least one lab the octopuses learned to squirt water at light bulbs, shorting them out. Lab workers repeatedly report that the escape, theft, or destruction of lighting, happens without warning just as the person’s attention wanders from the octopus and they are looking in another direction–the octopus can apparently estimate the direction of a person’s gaze both when the person is wearing a scuba mask underwater or is unencumbered on land. People who work with octopuses see intelligence at work. In the wild, Godfrey-Smith reports following individual octopuses for 15 minutes as they wander far and wide in search of food, and then when they are done, taking a direct route home to their lair. And he reports on yet another experimenter saying that unlike fish, which just exist in a tank in captivity, the behavior of captive octopuses is all about testing the limits of the tank, and they certainly appear to the humans around them to well know they are in captivity.

But are all these observations of octopuses an indication of consciousness?

Nagel struggles, of course, with defining consciousness. In his bat paper he tries to finesse the issue be inventing the term subjective experience as a key ingredient of consciousness, and then talking about those sorts of experiences. By subjective experience Nagel means what it feels like to the creature which has that experience. An outsider may well observe the objective reality of the creature having that experience, but not do so well at understanding how it feels to the creature.

For us humans we can mostly appreciate and empathize with how an experience that another human has is felt. We know what it feels like for ourself to walk through a large room cluttered with furniture, and so we know what it feels like to another person. Mostly. If we are sighted and the other person is blind, but they are clicking their tongue to do echo location it is much harder for us to know how that must feel and how they experience the room. As we get further away from ourselves it becomes very hard for us to even imagine, let alone know, what some experience feels like to another creature. It eventually becomes hard to make an objective determination whether it feels like anything at all to the creature.

Each of Nagel, Horowitz, and Godfrey-Smith argue for some form of subjective experience in their subject animals.

Godfrey-Smith makes a distinction between animals that seem to be aware of pain and those that do not. Insects can suffer severe injuries and will continue trying to do whatever task they are involved in with whatever parts of their bodies are still functional and without appearing to notice the injury at all. In contrast, an octopus acts jumps and flails when bitten by a fish, and for more severe injuries pays attention to and is protective of the injured region. To Godfrey-Smith this is a threshold for awareness of pain. He points out that zebrafish who have been injected with a chemical that is suspected of causing pain, will change their favored choice of environment to one that contains painkiller in the water. The zebrafish makes a choice of action in order to reduce its pain.

But is it also the ability to remember episodes (called episodic memory), past experiences as sequences of events, involving self, and to deduce what might happen next from those? Or is it the ability to build mental maps of the environment, and now where we are placed in them, and plan how to get to places we want to go? Is it a whole collection of little things that come together to give us subjective experiences? If so, it does seem that many animals share some of those capabilities with us humans.

Dogs seem to have some form of episodic memory  as they go to play patterns from the past with people or dogs that they re-encounter after many years, they pick up on when they are being walked or driven to some place that they do not like (such as the vet’s) and they pick up on behaviors of their human friends and seem to predict what their near term future emotional state is going to be as the human continues their established sequence of behaviors on the world. E.g., a dog might notice when their human friend is engaging in an activity that we would label as packing their bag for a trip, and go to the emotional state of being alone and uncomforted by that human’s presence.

On the other hand, dogs do not realize that it is them when they see themselves in a mirror, unlike all great apes (including us humans!) and dolphins. So they do not seem to have the same sense of self that those animals have. But do they realize that they have agency in the world? While they are not very good at hiding their emotions, which clearly they do have, they do have a sort of sense of decorum about what behaviors to display in front of people. A dear dog of mine would never ever jump up on to a table. She would get up on to chairs and couches and beds, but not tables. When our gardener was around mowing the lawn in a circle around the house she would run from glass door to glass door, very upset and barking, watching him make progress on each loop around the house, anticipating where he would appear next (more episodic memory perhaps!). And then one day Steve told us that when we where not home and he was mowing the lawn she would always jump up on the kitchen table, barking like crazy at him, watching has he moved from the east side to the south side of the house. She knew not to do that in front of us, but when we were not around her emotions were no longer suppressed and they got the better of her. That seems like a subjective experience to me.

Overall dogs don’t seem to exhibit the level of consciousness that we do, which is somehow wrapped up with subjective experiences, a sense of self, an ability to recall episodes in our lives, and to perhaps predict how we might feel, or experience, should a certain thing happen in the future.

I am pretty sure that no AI system, and no robot, that has been built by humans to date, possesses even a rudimentary form of consciousness, or even one that have any subjective experience, let alone any sense of self. One of the reasons for this is that hardly anyone is working on this problem! There is no funding for building conscious robots, or even conscious AI systems in a box for two reasons. First, no one has elucidated a solid argument that it would be beneficial in any way in terms of performance for robots to have consciousness (I tend to cautiously disagree–I think it might be a big breakthrough), and second, no one has any idea how to proceed to make it so.

Later in this post I refer to the SLAM problem (about building maps) which has been a major success in robotics over the last thirty years. That was not solved by one person or one research group. Real progress began when hundreds of researchers all over the globe started working on a commonly defined problem, and worked on it as a major focus for almost 20 years. At the moment for conscious robots we only have a small handful of researchers, no common definition or purpose, and no real path forward. It won’t just happen by itself.

some comments on unconsciousness in Humans

As we have discussed above, while animals have some aspects of consciousness we have seen no real evidence that they experience it as vividly as do we. But before we get too high horsey about how great we are, we need to remember that even for humans, consciousness may be a blatant luxury (or curse) that we may not really need for who we are.

Godfrey-Smith relates the case of a woman who can see, but who has no conscious experience of being able to see. This situation is known as blindsight although Godfrey-Smith does not seem to be aware that this phenomenon had previously been discussed with earlier patients, in a very scholarly book titled “Blindsight”^{\big 6}. We will stick with Godfrey-Smith’s case as he has produced a succinct summary.

The patient is a woman referred to as DF who had damage to her brain from carbon monoxide poisoning in 1988.

As a result of the accident, DF felt almost blind. She lost all experience of the shapes and layout of objects in her visual field. Only vague patches of color remained. Despite this, it turned out that she could still act quite effectively toward the objects in space around her. For example, she could post letters through a slot that was placed at various different angles. But she could not describe the angle of the slot, or indicate it by pointing. As far as subjective experience goes, she couldn’t see the slot at all, but the letter reliably went in.

Godfrey-Smith goes on to report that even though she feels as though she is blind she can walk around obstacles placed in front of her.

The human brain seems to have two streams of visual processing going through two different parts of the brain. There is one that is used for real-time bodily adjustments as one navigates through space (walking, or getting the letter through the slot). That area of the brain was undamaged in DF. The other area, where vision is used for categorization, recognition, and description of objects was the part that was damaged in DF, and with that gone so went the subjective experience of seeing.

We all know that we are able to do many things unconsciously, perhaps more than we are conscious of! Perhaps we are only conscious of some things, and make up stories that explain what we are doing unconsciously. There are a whole class of patients with brains split at the corpus callosum, where this making stuff up can be readily observed, but I will save that for another post.

A Short Aside

Apart from the main content of this post, I want to point out one, to me, miraculous piece of data. We now can see that intelligence evolved not once, but at least twice, on Earth.

The fact that a bilateral flatworm is the common ancestor of two branches of life, the vertebrate branch and another that in turn split into arthropods and mollusks is very telling. Today’s closest relatives to those old flatworms from 600 million years ago, such as Notoplana Acticola, have about 2,000 neurons. That is far less than many of the arthropods have (100,000 is not uncommon for insects), and it seems that insects have no subjective experience.  It is extremely unlikely, therefore, that the flatworms were anything more than insect like.

That means that the intelligence of octopuses, and other cephalopods, evolved completely independently from the evolution of the intelligence of birds and mammals. And yet, we are able to recognize that intelligence in their behavior, even though they have very alien Merkwelts compared to us.

Intelligence evolved at least twice on Earth!

So far we have observed 100 billion galaxies with the Hubble Space Telescope. It is expected that when the Webb Space Telescope gets into orbit we will double that number, but let’s be conservative for now with what we have seen. Our own galaxy, which is very typical has about 100 billion stars. Our Sun is just one of them. So we know that there are 10,000,000,000,000,000,000,000 stars out there, at least.  That is 10^{22} stars. It is estimated that there are 7.5\times 10^{18} grains of sand on Earth.  So for every grain of sand, on every beach, and every reef in the whole Earth, there are more than 1,000 stars in our Universe.

In the last 20 years we have started looking for planets around stars in our very tiny local neighborhood of our galaxy. We have found thousands of planets, and it seems that every star has at least one planet. One in six of them has an Earth sized planet. In February of this year NASA’s Spitzer Space Telescope found that a star only 40 light years from Earth (our galaxy is 100,000 light years across, so that is really just down the street, relatively speaking) has seven Earth sized planets, and three of them are about the same temperature as Earth, which is good for Earth-like life forms.

See my blog post on how flexible it turns out even Earth-based DNA is and how we now know that living systems do not need to have exactly our sort of DNA. Enormous numbers of planets in the Universe, and just our local galaxy, with still enormous numbers of Earth like planets, and lots of routes to life existing on lots of those planets. And in the one instance we know about where life arose we have gotten intelligence from non-intelligence at least twice, totally independently. This all increases my confidence that intelligent life is abundant in the Universe. At least as intelligent as octopuses, and birds, and mammals, and probably as intelligent as us. With consciousness, with religions, with Gods, with saviors, with territorialism, with hate, with fear, with war, with grace, with humor, with love, with history, with aspirations, with wonder.

There is a lot of exciting stuff out there in the Universe if we can figure out how to observe it!!!!

THE MERKWELTEN OF OTHER ANIMALS

Let’s now talk about the Merkwelten (plural of Merkwelt) of octopuses, dogs, and humans. Much, but certainly not all, of the material below on octopuses and dogs is drawn from the books of Godfrey-Smith and Horowitz. By comparing these three species we’ll be attuned to what to think about for the Merkwelt of a robot.

Octopuses are able to respond to visual stimuli quite well, and it seems from the anecdotes that they can see well enough to know when a person is looking at them or looking away. There is good evolutionary sense in understanding whether potential predators are looking at you or not and even some snakes can exhibit this visual skill. But octopuses can be trained on many tasks that require vision. Unlike humans, however, the training seems to be largely eye specific. If an octopus is trained on a task with only one of its eyes being able to see the relevant features, it is unable to do the task using its other eye. Humans, on the other hand, unless their corpus callosum has been severed, are able to effortless transfer tasks between eyes.

Godfrey-Smith recounts a 1956 experiment with training octopuses to visually distinguish between large squares and small squares, and behave differently to get a reward (or to avoid a shock…it was 1956, well before octopuses were declared honorary vertebrates for the purpose of lab animal rules) for the two cases. The octopuses were able to make the right distinctions independent of how far away the squares were placed from them–a small square placed close to an octopus would appear large in its visual field, but the octopuses were able to correct for this and recognize it correctly as a small square. This tends to suggest that the squares are more than just visual stimuli. Instead they are treated as objects independent of the location in the world, and the octopus’s point of view. This feels like the square is a subjective experience to the octopus. This capability is known as having perceptual constancy, something not seen in insects.

Octopuses are also very tactile. They reach out their tentacles to feel new objects, and often touch people with a single tentacle. Besides touch it is thought that they are able to taste, or smell, with the tentacles, or at least sense certain molecules. Eight touchy feely tasting sensors going in all directions at once. This is a very different Merkwelt from us humans.

Another astonishing feature of octopuses, and even more so of cuttlefish, is their ability to rapidly change their skin color in very fine detail. They have a number of different mechanisms embedded in their skin which are all used in synchrony  to get different colors and patterns, sometimes rippling across their skin much like a movie on a screen. They are able to blend in to their surroundings by mimicking the color and texture around them, becoming invisible to predators and prey alike.

But here is the kicker. Both octopuses and cuttlefish are color blind! They only have one sort of visual receptor in their eyes which would predict color blindness. And there have been no tests found where they respond to different colors by seeing them with their eyes. Their skin however seems to be color sensitive and that programs what is shown on the other side of a tentacle or the body, in order to camouflage the octopus. So octopuses have blindsight color vision, but not via their eyes. They can make no decisions at a high level based on color, but their bodies are attuned to it.

Is this so very different to us, in principle? Our gut busily digests our food, responding to what sort of food is there, and no matter how hard we think, we can neither be aware of what details have been detected in our food (except in the rare case that it causes some sort of breakdown and we vomit, or some such), nor can we control through our thoughts the rate at which it is digested. Like the octopus, we too have autonomous systems that do not rise to the level of our cognition.

So let’s bring all this together and see if we can imagine the Merkwelt of an octopus. Let’s be an octopus. We know the different feeling of being in water and out of water, and when we see places that are out of water we know they are non-watery. We see things with one eye, and different things with another eye, but we don’t necessarily put that those two views together into a complete world view. Though if we are seeing things that have permanence, rather than being episodic, we can put them into a mental map, and then recognize them with either eye, and into our sense of place as we wander around. We rarely see things upside down, as our eyes rotate to always have the same orientation with respect to up and down, almost no matter what our bodies are doing, so we see the world from a consistent point of view. Unfortunately what we see is without color, even in the middle of the day, and near the surface. We can only hear fairly low frequency sounds, but that is enough to hear some movements about us. In contrast to our dull sight and hearing, the world mediated via our tentacles is a rich one indeed. We have eight of them and we can make them do things on command, but if we are not telling them exactly what to do they wander around a bit, and we can smell and feel what they are touching. We can smell the water with our tentacles, smell the currents going by, and smell the things our tentacles touch. It is as rich a sense of the being in the world as seeing it is for many other animals.

Let’s now turn to dogs. We all know that their sense of smell is acute, but first let’s look at their eyesight.

The eyes of mammals have two different sorts of light receptors. Cones are color sensitive, and rods, which work in much dimmer light are not, and that is why we lose our color sensitivity at night–we are using different sensor elements to get much of our visual information. Humans have three sorts of cones, each sensitive to a different part of the color spectrum. The following is a typical human sensitivity profile, taken from Wikimedia.

The longer wavelength version is sensitive to colors centered in the red area of the spectrum (the bottom axis is giving the wavelength of light in nanometers), the middle wavelength around green, and the shorter one is centered around blue. They all overlap and there is quite a bit of processing to get to a perceived color–we’ll come back to that below. Just as individual humans have physical variations in their bone lengths, muscle mass, etc., so too do individuals have slightly different shaped sensitivity curves for their cones, and different values for peak sensitivity. So we probably all perceive colors slightly differently. Also it turns out that about 12% of woman have a fourth type of cone, one centered on yellow, between red and green.  However, for very few of those women is the cone’s peak sensitivity far enough away from their red peak to make much difference–for the few where it is much displaced, a much richer color palette can be perceived than for most people. Those few have a quite different color Merkwelt than the rest of us.

Dogs, on the other hand have only two sorts of cones, but many more rods. Their night vision is much better than ours, though their acuity, the ability to distinguish fine detail, is less than ours. Their color cones are centered on yellow and blue, and recent evidence suggests that they can see down into the ultraviolet with their blue cones, at much shorter wavelengths than we can see at all. So they are certainly not color blind but they see different colors than we do. Since the color that we see for an object is often the blend of many different materials in the object, reflecting light at different wavelengths, there are surely materials in objects that we see as some particular color, that a dog sees as a very different color. So our beautiful beige bathroom where the floor tiles blend with the shower stall tiles, and blend with the color of the paint on the walls, may instead be a cacophony of different colors to a dog. We won’t know if our dog is seeing our bathroom or living room more like this:

where we might be seeing it as a blend of soothing colors. Beyond the direct sensory experience we don’t know if dogs have any developed taste for what is a calm set of colors and what is a jarring set of colors.

What we do know about dogs is that they are not able to distinguish green and red, and that different electric light spectra will alter what they see in the world in ways different to how it alters things for us. If you have seen blue LED street lights you will know just how much different lighting can change our perceptions. The same is true for dogs, but in different ways that are hard for us to appreciate how it might feel to them. But, I am pretty sure that dogs whose owners dress them up in red and green for Christmas are not having the festive nature of their clothes seep into their doggy consciousness…to them it is all gray and grey.

The big part of a dog’s Merkwelt, to which we are largely not privy at all, is through its sense of smell. We have all seen how dogs sniff, and push their nose into things, lift their nose into the air to catch scents on the breeze, and generally get their nose into everything, including butts, crotches, and the ground.

Horowitz estimates that a dogs sense of smell is a million times more sensitive than ours–it is hard for us to imagine that, and feels overwhelming compared to our rather little used noses. She points out that to a dog every petal of a single flower smells different, and the history of what insects have been there is all there in the dog’s Merkwelt. More than just smelling things as we might, dogs can tell how the smell is changing in just a few seconds, giving them time information that we have never experienced.

To a dog the world is full of smells, laid down weeks and seconds ago, distinguishable by time, and with information about the health and activities of people, animals, and plants. Dogs may not react differently to us when they see us naked or dressed, but they do react differently by detecting where we have been, what we have been doing, and whether we are ill–all through smell.

And just as our eyesight may reveal things that dogs may not notice, dogs notice things that we may be oblivious to. Horowitz points out that the new dog bed we buy might smell terrible to a dog, just as the purple and yellow living room above looks atrocious to me. A dog’s Merkwelt is very different from ours.

The Merkwelt of Humans

We will get to robots soon. But we really need to talk about the Merkwelt of humans first. Our own subjective experiences tell us that we perceive the world “as it is”. But since bats, octopuses, and dogs perceive the world as it is for them, perhaps we are not perceiving any particular ground truth either!

Donald Hoffman at the University of California at Irvine (previously he was at the AI Lab in 1983 just as he finished up his Ph.D. at M.I.T.), has argued for a long time that our perceptual world, our Merkwelt is not what the real world is at all. His 1998 book “Visual Intelligence: How We Create What We See”, makes this point of view clear in its title. A 2016 article and interview with him in The Atlantic is a very concise summarization of his views.  And within that I think this quote from him gets at the real essence of what he is saying:

Suppose there’s a blue rectangular icon on the lower right corner of your computer’s desktop — does that mean that the file itself is blue and rectangular and lives in the lower right corner of your computer? Of course not. But those are the only things that can be asserted about anything on the desktop — it has color, position, and shape. Those are the only categories available to you, and yet none of them are true about the file itself or anything in the computer. They couldn’t possibly be true. That’s an interesting thing. You could not form a true description of the innards of the computer if your entire view of reality was confined to the desktop. And yet the desktop is useful. That blue rectangular icon guides my behavior, and it hides a complex reality that I don’t need to know. That’s the key idea. Evolution has shaped us with perceptions that allow us to survive. They guide adaptive behaviors. But part of that involves hiding from us the stuff we don’t need to know. And that’s pretty much all of reality, whatever reality might be.

By going from the natural world to the world of the computer (this back and forth is going to get really interesting and circular when we do finally get to robots) he is able to give a very convincing argument about what he means. And then we humans need to extend that argument to the world we normally inhabit by analogy, and we start to see how this might be true.

I, and many others, find the world of so-called “optical illusions” to be very helpful in convincing me about the truth of the disconnect between human perception, our Merkwelt, and the reality of the Universe. Perhaps the greatest creator of images that show us this disconnect is Akiyoshi Kitaoka from Ritsumeikan University in Kyoto, Japan.

If you go to his illusion page you will see static images that appear to move, straight lines that bulge, waves made from 2D squares, and concentric circles that appear to be spirals, and links to almost 60 pages of other illusions he has created.

But the one of strawberries reproduced here is perhaps the best:

Our perceptual system sees red strawberries. But there are none in the image. There are no red pixels in the image. Here is the tip of the bottom center strawberry expanded.

If you think you see redness there cover up the bottom half with your hand, and it looks grey. Or scroll it to the top of the screen so that the strawberry image is out of view. For me, at least, as I scroll quickly past the strawberry image, the redness of the strawberries temporarily bleeds out into the grey background on which the text is set!

A pixel on a screen is made up of three different colors, red, green, and blue, usually where each of R, G, and B, are represented by a number in the range 0 to 255, to describe how bright the red, green, or blue should light up at a particular point of the screen or image, where 0 means none of that color and 255 means as much as possible. In looking through the raw 734,449 pixels in the strawberry image, only 122, or 0.017%, have more red than each of green and blue at a pixel, and the biggest margin is only 6 out of 255.  Here are the three such colors, with the most more red than both green and blue, just duplicated to make a square. All grey!

The values of those RGB pixels, left to right are (192, 186, 186) = 0xc0baba, (193, 187, 187) = 0xc1bbbb, and (189, 183, 183) = 0xbdb7b7. All the pixels that are more red than both green and blue look grey in isolation. The right hand square represents the most red pixel of all in the picture, and perhaps it looks a tinge of red, nothing like the redness we see in the strawberries, and only a tiny handful of pixels are this red in the image. In fact there are only 1,156 pixels, or 0.16%, where red is not the smallest component of the color.

The most red in any of the pixels has a value of 195, but in all of them there is more of one of the other colors, green or blue. Here are three of the pixel colors, from 18 total colors, in the image that have the most red in them (values 0xc3ffff, 0xc3e5ef, and 0xc3c4be): But the image looks to us humans like it is a picture of red strawberries, with some weird lighting perhaps, even though the pixels are not red in isolation.

We have our Merkwelt which is our sensor world, what we can sense, but then our brain interprets things based on our experience. Because we might come across strawberries under all sorts of different lighting conditions it makes sense for us to not simply rely on the raw color that we perceive. Instead we take lots of different clues, and map a whole lot of greys and cyans, with some contrast structure in them, into rich red strawberries. In this case our perceptual system gets it right. Some of Professor Kitaoka’s images show us how we can get it wrong.

So… bats and dogs probably do some of this too, perhaps in echo space or in smell space. Octopuses most likely also do it. Will our robots need to do it to be truly effective in the world?

THE MERKWELT OF TODAY’s DOMESTIC ROBOTS

The only mass deployed robots in the home today are robot vacuum cleaners. A company I cofounded, iRobot, though not the first to market, lead the way in getting them out into the world, and it alone has sold over 20 million Roombas. Other companies, both original technology developers, and illegal copiers (sometimes on iRobot tooling) in China have also joined the market, so there may well be 50 million of these robots worldwide now.

The early version robot vacuum cleaners had very few sensors, and a very simple Merkwelt. They had bump sensors, redundant cliff sensors, a sensor for infrared beams, and a voltage sensor on its recharging contacts. The bump sensor simply tells the robot it hit something. The cliff sensors, a downward looking infrared “radar”, and a wheel drop sensor, tell the robot when it is at a drop in the floor. Two types of infrared beams were sensed, one a virtual wall that a home owner could set up as a no go line in their house. The early Roombas also always had two internal sensors, one for its own battery voltage so that it could know when recharging was needed, or complete. If the battery voltage was down it would stop cleaning and wander in search of a special infrared beacon that it could line up on and go park itself on the recharging unit.  There was also one for when the dirt bin was full, and that too would trigger the go-park-at-home behavior, though in the very earliest version before we had recharging stations it would just stop in its tracks, and play a little tune indicating that the bin was full and shut down^{\big 7}.

These Roombas were rather like Godfrey-Smith’s discussion of insects. If some part of them failed, if one of the drive wheels failed for instance, they just continued on blissfully unaware of their failure, and in this case ran around in a circle. They had no awareness of self, and no awareness that they were cleaning, or “feeding”, in any sense; no sense or internal representation of agency. If they fed a lot, their bin full sensor would say they were “satiated”, but there was no connection between those two events inside the robot in any way, whether physically or in software. No adaptation of behavior based on past experience. Even the 2,000 neuron flatworms that exist today modify their behaviors based on experience.

Physics often comes into play when talking about robots, and there was one key issue of physics which has an enormous impact on how Roombas clean. They are small and light, and need to be for a host of reasons, including being safe should they run into a sleeping dog or baby on the floor, and not stressing a person bending down to put them on the floor. But that means they can not have much onboard battery, and if they were designed to vacuum like a typical vacuum cleaner the 300 Watts of power needed to suck up dirt from carpets would limit the battery life to just a very few minutes. The solution was to have a two stage cleaning system. The first stage was two counter rotating brushes lateral across the body perpendicular to the direction of travel. That pulled up dirt near the surface and managed to collect the larger particles that it it threw up. Following that was a very thin linear slit, parallel to the brushes, sucking air. By being being very thin it could achieve suction levels similar to those of a 300 Watt vacuum cleaner with only 30 Watts. But it could not actually suck any but very small particles through the orifice. But indeed, that is how it cleans up the fine dust, but leaves larger particles on the top of the carpet. And that is why the Roomba needs to randomly pass over a surface many times. The next time around those larger particles will already be on the surface and the brushes will have a better chance to getting them up.

A side effect of those powerful brushes is that tassels of rugs, or power cords lying on the floor would get dragged into the brush mechanism and soon would be wrapped around and around, leaving the robot straining in place, with an eventual cut off due to perceived high current drain from the motors. But the early versions were very much like insects, not noticing that something bad was happening, and not modifying their behavior. Later versions of the Roomba solved this problem by noticing the load on the brushes as something tangled, and reversing the direction of the brushes backing the tangler out. But still no subjective experience, still no noting of the location of a tassel and avoiding it next time through that area–even very simple animals have that level of awareness.

Another problem for the first decade of Roombas was that brushes would eventually get full of knotted air, reducing their efficiency in picking up dirt. Despite the instructions that came with the robot most people rarely cleaned the brushes (who would want to?–that’s why you have a robot vacuum cleaner, to do the dirty work!). Later versions clean the brushes automatically, but again there is no connection between that behavior and other observations the robot might make from its Merkwelt.

Well before the bush cleaning problem was solved there was a sensor added internally to measure how much dirt was being picked up. If there was a lot, then the robot would go into a circling mode going over the same small area again and again as there was most likely a whole lot of dirt in that area. Again, no awareness of this was connected to any other behavior of the robot.

More recently Roomba’s have added a sense of place, through the use of an upward looking camera. As the robot navigates it remembers visual features, and along with a crude estimate of how far it has travelled (the direction of the nap of a carpet can induce 10% differences in straight line odometry estimate in one direction or the other, and turns are much worse), it recognizes when it is seeing the same feature again.  There has been thirty years of sustained academic work on this problem, known as SLAM (Simultaneous Localization And Mapping), and Roomba has a vision based version of this–known as VSLAM. The Roomba uses the map that it creates to check off areas that it has spent time in so that it is able to spread out its limited cleaning time. It also allows it to go in straight lines as customers seem to want that, as though they are not aware of the way they, humans, vacuum a room, with lots of back and forth motions, and certainly no straight line driving the vacuum cleaner up and back along a room.

VSLAM is as close as a Roomba gets to episodic memory, but it is not really generalized episodes, it doesn’t know about other things and incidents that happened relative to that map. It seems a much weaker connection to the geometry of the world than birds or mammals or octopuses have.

It is plausible given the Merkwelt of recent Roombas, those with a camera for VSLAM, that some sort of sense of self could be programmed, some some sort of very tenuous subjective experience could be installed, but really there is none. Will people building the next generation of domestic robots, with more complex Merkwelts, be tempted, able, or desirous of, building robots with subjective experiences. I think that remains to be seen, but let’s now look at where technology market forces are going to drive the Merkwelt of domestic robots, and so drive what it will be like to be such a robot.

The Merkwelt OF TOmoRROW’S Domestic Robots

There are already more robot vacuum cleaners  in people’s homes than many species of intelligent animals have in absolute numbers. Their Merkwelt is not so interesting. But there are lots and lots of startups and established companies which want to built domestic robots, as there is a belief that this is going to be a big growth area. I believe that too, and will be talking about how the world-wide demographic inversion will drive that, in a soon to be finished blog post. This next generation of domestic robots is going to have a the possibility of a much richer raw Merkwelt than previous ones.

The new Merkwelt will be largely driven by the success of smart phones over the last ten years, and what that has done to the price and performance of certain sensors, and to low power, physically small computation units. And this Merkwelt can be very different from that of us, or dogs, or octopuses, or bats. It is going to be very hard, over the next few paragraphs, for us to think about what is it like to be one of these robots–it will stretch us more than thinking about our relatively close relatives, the octopuses.

The innards of a modern smart phone, without the actual cameras, without the expensive touch screen, without the speakers and microphone, without the high performance batteries, and without the beautifully machined metal case, are worth at retail about $100 to $200. Both Samsung and Qualcomm, two of the biggest manufacturers of chips for phones, sell boards at retail, in quantity of one, which have most of the rest of a modern cell phone for about $100. This includes eight high performance 64 bit processors, driver circuitry for two high definition cameras, the GPU and drivers for the screen, special purpose silicon that finds faces so that the focus in photos can be on them, special purpose speech processing and sound generating silicon, vast quantities of computer memory, cryptographic hardware to protect code and data from external attacks, and WiFi and Bluetooth drivers and antennas. Missing is the GPS system, the motion sensors, NFC (near field communication) for things like Apple Pay, and the radio frequency hardware to connect to cellular networks. Rounding up to $200 to include all those is a safe bet.

Anyone considering building a domestic robot in the next five years would be crazy not to take advantage of all this phone capability. $200 in quantity of one gets a vast leg up on building the guts of a domestic robot. So this is what will largely drive the Merkwelt of our coming domestic robots over the next decade or so, and phones may continue to drive it, as phones themselves change, for many decades. Don’t be surprised to see more silicon in phones over the next few  years that is dedicated to the runtime evaluation of deep learning (see my recent post on the end of Moore’s Law).

There may well be other sensors added to domestic robots that are not in phones, and they will be connected to the processors and have their data handled there, so that will be added to the Merkwelt. But a lot of Merkwelt is going to come from the cameras and microphones, and radio spectra of mobile phones.

In the forthcoming 5G chip sets for mobile phones there will be a total of nine different radio systems on high end smart phones.

Even if only equipped with the $100 stripped down phone systems our domestic robots will be able to “smell” our Bluetooth devices and our WiFi access points, and any devices that use WiFi. As I look around my house I see WiFi printers, laptops, tablets, smart phones, and a Roku device attached to my TV (more recent “smart TVs” have WiFi connections directly). As active Bluetooth devices I have computer mice, keyboards, scales, Fitbits, watches, speakers, and many more. The Merkwelt of our domestic robots will include all these devices. Some will be in fixed locations, some will move around with people. Some will be useful for associating with a particular person who might also be located by cameras on the robot. But the robot will have eyes in the back of its head–without pointing its cameras in some direction it may well be able to know when a particular person is approaching just from their Bluetooth signature. With this Merkwelt, and just a little bit of processing, our domestic robots can have a totally different view of the world than we have–a technological view provided by our communication devices. To us humans they are just that, communication devices, but to our robots they will be geographic and personal tags, understood by viewing just a few identifying bits of information within the signals. But depending on who builds the robots and who programs them they might be able to extract much more information about us than just distinguishing us as one that has been seen before. Perhaps they will have the ability to listen in on the content of our communications, be it how many steps we have taken today, or the words we are typing to our computer, or our emails coming in over the WiFi in our house.

In recent years Dina Katabi and her students at M.I.T. CSAIL (Computer Science and Artificial Intelligence Lab) have been experimenting with processing ordinary WiFi signals down at the radio level, rather than at the bit packet level. Every phone has to do some of that, but mostly they just want to get to the bit packet, and perhaps have a little bit of quality of service information. Katabi and her students look at how timing varies very subtly and have used that to detect people, and even their emotions through detecting their breathing and heart rate, and how they are changing. Note that this does not require any sensors attached to a person–it is just detecting how the person’s physical presence changes the behavior of WiFi signals. Our future domestic robots may able to get a direct read on us in ways the people are never able to do. Whether this will count as a subjective experience for a robot will depend on how narrowly the output of the processing is drawn, and is passed on to other processing within the robot.

That was the minimal chip sets that any future domestic robot is likely to use. If the chip sets used instead consist of the full complement of communications channels that a smart phone has there will be much more richness to the Merkwelt of the robot. Using GPS, even indoors, it will roughly know its global coordinates. And it will have access to all sorts of services any smart phone uses, including the time of day, the date, the current weather and weather forecast both locally and elsewhere in the work. It will know that it is dark outside because of the time of day and year, rather than know the time of day because of how long it has been dark or light. Us humans get that sort of information in other ways. We know whether we are inside or outside not from GPS coordinates and detailed maps, but because of how it feels with light, the way the air is circulating, and how sounds reverberate. Those ways are part of our subjective experiences.

Aside: If our robots bypass the way that we experience the world with direct technological access to information it will be hard for them to understand our limitations. If they ever are consciousness they may not have much empathy for us. And should there be a horrible infrastructure destroying event (e.g., attacks on satellites, nuclear war, etc.) our robots will be left incapacitated just when we need them the most. Something to ponder as we design them.

Going back to the radio level below the bit packet level, with these additional radio frequencies, by comparing the arrival time of different signals in the many wireless domains of 5G it is possible to move from a direct Merkwelt of delays in signals to starting to understand the built environment, whether it is concrete, or wood, or whether there are large metal moving objects (trucks or buses) nearby. If the builders of our domestic robots decide to program all this in, they will have yet a weirder, to us, super-sense of the world around them.

The cameras on our robots could be chosen, easily enough, to extend the range of light that they see to ultraviolet, and/or infra red. In the later case they will be able to see where we have recently been sitting, and perhaps which way we have walked from a heat trail. That will be another weird super human sense for them to have that we don’t. But they might also tap into all the devices that are starting to populate our homes, our Amazon Echos and our Google Homes, and even our smart TVs, that can listen to us at all times. A home robot, in its terms of service which we will all agree to without reading, and through some corporate agreements might well have access to what those devices are hearing. And access to what the cameras on our smart smoke detectors are seeing. And what went on in our car right before we got home. And where we just used our electronic wallet to buy something, and perhaps what it is that we bought. These domestic robots may have a breadth of Merkwelt directly into our lives that the most controlling parent of teenage children could only imagine.

So the sensor world, the Merkwelt, may be very different for our domestic robots than for us. They will know things we can’t know, but they also may lack understanding out subjective experiences. How well they are able to relate will depend on how well their perceptual processing aligns with ours. And this is where thinking about the Merkwelt of our robots eventually gets very murky indeed.

Deep learning has been a very successful technology over the last five years. It comes from 30 years of hard work by people such as Geoff Hinton (University of Toronto and Google) and Yann LeCun (New York University and Facebook). Their work has revolutionized how well speech understanding works, and why we now have Amazon Echo, Google Home, and Apple Siri. It has also been used with very large training sets of images with labels of what is in them to train systems to themselves label images.

Here are some examples  from a paper by Andrej Karpathy and Li Fei-Fei. Two networks, one trained on images and one trained on producing naturalistic English description, combined to very successfully label these three images.

I chose these as they are particularly good examples. At their project page the authors give many more examples, some of which are not as spot on accurate as these. But just as human vision systems produce weird results sometimes, as we pointed out with the Kitaoka illusions above, so to do the deep learning vision systems.

In a paper by Anh Ngyuen, Jason Yosinski, and Jeff Clune, the authors use a deep learning trained network, trained on the same image set as the network above, and a language network that generates a label of the primary object in an image. It does great on all sorts of images that a human easily sees the same object as it claims. Their network gives a percentage certainty on its labels and the good examples all come out at over 99%.

But then they get devious and start to stress the trained network. First they use randomly generated equations to generate images when the variables are given a range of values. Some of those equations might randomly trigger the “guitar” label, say with slightly more than 0.0% likelihood–so really not something that the network believes is a guitar, just a tiny bit more than zero chance. Many images will generate tiny chances. Now they apply evolutionary computing ideas. They take two such equations and “mate” then to form lots of children equations crossing over subtrees, much as in biological systems the offspring have some DNA from each of their parents.

For instance the two expressions (x + y) \times (x^3 - y^2) and \sin^2 x + \cos y might generate, among others, children such as (x + \sin^2 x) \times (x^3 - y^2) and (\cos y + y) + (x^3 - (\cos y)^2). Most of these children equations will get a worse score for “guitar”, but sometimes a part of each parent which tickled guitarness in the trained network might combine to give a bigger score. The evolutionary algorithm chooses better offspring like that to be parents of the next generation. Hundreds or thousands of generations later it might have evolved an equation which gets a really good score of “guitar”. The question is do those images look like guitars to us?

Here are some labels of generated images that all score better then than 99.6%.

Remember, these are not purely random images that happen to trigger the labels. Vast amounts of computer time where used to breed these images to have that effect.

The top eight look like random noise to us, but these synthetic, and carefully selected images, trigger particular labels in the image classifier. The bottom ones have a little more geometric form and for some of them (e.g., “baseball”) we might see some essence of the label (i.e., some “baseballness” in things that look a bit like baseball seams to us). Below is another set of labels of generated images from their paper. The mean score across this set of labels is 99.12%, and the authors selected these results out of many, many such results as they have some essence of the label for a human vision system.

My favorite of these is “school bus”. I think all people used to seeing American school buses can get the essence of this one. In fact all of them have some essence of the label. But no competent human is going make the mistake of saying that with 99% probability these are images of the object in the label. We immediately recognize all of these as synthetic images, and know that it would be a very poor Pixar movie that tried to pass off things that looked like these images as objects that are instances of their labels.

So…all our robots are going to be subject to optical illusions that are very different from ours. They may see subtle things that we do not see, and we will see subtle things that they do not see.

The relationship between Humans and Robots

We and our domestic robots will have different Merkwelts in our shared homes. The better we understand the Merkwelt of our robot helpers, the better we will be able to have realistic expectations of what they should be able to do, and what we should delegate and entrust to them, and what aspects of our privacy we are giving up by having them around. If we build in to them a reasonable model of the human Merkwelt, the better they will be able to anticipate what we can know and will do, and so smooth our interactions.

Our robots may or may not (probably the former in the longer term) have a sense of self, subjective experiences, episodic memory blended into those subjective experiences, and even some form of consciousness, in the way that we might think our dogs have some form of consciousness. The first few generations of our domestic robots (our robot vacuum cleaners are the first generation) will be much more insect-like than mammal-like in the behavior and in their interactions with us.

Perhaps as more people start researching on how to imbue robots with subjective experiences and conscious-like experiences we will begin to have empathy with each other. Or perhaps we and our robots will always be more like an octopus and scuba diver; one is not completely comfortable in the other’s world, not as agile and aware of the details of that world, and the two of them are aware of each other, but not really engaged with each other.



^{\big 1}Thomas Nagel, “What Is It Like to Be a Bat?”, Philosophical Review, Vol 83, No. 4, (Oct 1974), pp. 435-450.

This paper has been reprinted in a number of collections and my own copy is a 30+ year old Xeroxed version from such a collection. The original is available at JSTOR, available through university libraries, or by paying $15. However it appears that many universities around the world make it available for courses that are taught there and it is quite out in the open, from JSTOR, at many locations you can find with a search engine, e.g., here at the University of Warwick or here at the University of British Columbia.

^{\big 2}Alexandra Horowitz, “Inside of a Dog: What Dogs See, Smell, and Know”, Scribner, New York, 2009.

^{\big 3}Peter Godfrey-Smith, “Other Minds: the Octopus, the Sea, and the Deep Origins of Consciousness”, Farrar, Strauss and Giroux, New York, 2016.

^{\big 4}Jakob von Uexküll, “Umwelt und Innenwelt der Tiere”,  Springer, Berlin, 1921.

^{\big 5}While checking out the origin of Merkwelt for this blog post I was surprised to see that according to Wikipedia I am one of the people .responsible for its use in English–I first used it in an AI Lab Memo 899  at M.I.T. in 1986 and then in the open literature in a paper finally published in the AI Journal in 1991 (it was written in 1987).

^{\big 6}Lawrence Weiskrantz, “Blindsight”, Oxford University Press, Oxford, United Kingdom, 1986.

^{\big 7}The dirt bin of the original Roomba was much too small. We had built a model two level apartment in a high bay at iRobot, and tested prototype Roombas extensively in it. We would spread dirt on different floor surfaces and make sure that Roomba could clean the floor. With the stair case we could test that it would never fall down them. When the dirt bin got full we just had the Roomba stop. But it never got close to full in a single cleaning, so we figured that people would empty the dirt bin before they left it to clean their house, and it would get all the way through a cleaning without getting full. Our tests showed it. We thought we had an appropriate model Umwelt for the Roomba. But as soon as we let Roombas out into the wild, into real people’s real apartments and houses things went wrong. The Roombas started stopping under people’s beds, and the people had to crawl under there too to retrieve them! Then it dawned on us. Most people were not cleaning under their beds before they got a Roomba. So it was incredibly dirty under there with hair balls full of dirt. The Roomba would merrily wander under a bed and get totally full of dirt! After a few cleanings this problem self corrected. There were no longer any of these incredibly dirty islands left to choke on.

The End of Moore’s Law

rodneybrooks.com/the-end-of-moores-law/

I have been working on an upcoming post about megatrends and how they drive tech.  I had included the end of Moore’s Law to illustrate how the end of a megatrend might also have a big influence on tech, but that section got away from me, becoming much larger than the sections on each individual current megatrend. So I decided to break it out into a separate post and publish it first.  Here it is.



Moore’s Law, concerning what we put on silicon wafers, is over after a solid fifty year run that completely reshaped our world. But that end unleashes lots of new opportunities.

WHERE DID MOORE’S LAW COME FROM?

Moore, Gordon E., Cramming more components onto integrated circuits, Electronics, Vol 32, No. 8, April 19, 1965.

Electronics was a trade journal that published monthly, mostly, from 1930 to 1995. Gordon Moore’s four and a half page contribution in 1965 was perhaps its most influential article ever. That article not only articulated the beginnings, and it was the very beginnings, of a trend, but the existence of that articulation became a goal/law that has run the silicon based circuit industry (which is the basis of every digital device in our world) for fifty years. Moore was a Cal Tech PhD, cofounder in 1957 of Fairchild Semiconductor, and head of its research and development laboratory from 1959. Fairchild had been founded to make transistors from silicon at a time when they were usually made from much slower germanium.

One can find many files on the Web that claim to be copies of the original paper, but I have noticed that some of them have the graphs redrawn and that they are sometimes slightly different from the ones that I have always taken to be the originals. Below I reproduce two figures from the original that as far as I can tell have only been copied from an original paper version of the magazine, with no manual/human cleanup.

The first one that I reproduce here is the money shot for the origin of Moore’s Law. There was however an equally important earlier graph in the paper which was predictive of the future yield over time of functional circuits that could be made from silicon. It had less actual data than this one, and as we’ll see, that is really saying something.

This graph is about the number of components on an integrated circuit. An integrated circuit is made through a process that is like printing. Light is projected onto a thin wafer of silicon in a number of different patterns, while different gases fill the chamber in which it is held. The different gases cause different light activated chemical processes to happen on the surface of the wafer, sometimes depositing some types of material, and sometimes etching material away. With precise masks to pattern the light, and precise control over temperature and duration of exposures, a physical two dimensional electronic circuit can be printed. The circuit has transistors, resistors, and other components. Lots of them might be made on a single wafer at once, just as lots of letters are printed on a single page at one. The yield is how many of those circuits are functional–small alignment or timing errors in production can screw up some of the circuits in any given print. Then the silicon wafer is cut up into pieces, each containing one of the circuits and each is put inside its own plastic package with little “legs” sticking out as the connectors–if you have looked at a circuit board made in the last forty years you have seen it populated with lots of integrated circuits.

The number of components in a single integrated circuit is important. Since the circuit is printed it involves no manual labor, unlike earlier electronics where every single component had to be placed and attached by hand. Now a complex circuit which involves multiple integrated circuits only requires hand construction (later this too was largely automated), to connect up a much smaller number of components. And as long as one has a process which gets good yield, it is constant time to build a single integrated circuit, regardless of how many components are in it. That means less total integrated circuits that need to be connected by hand or machine. So, as Moore’s paper’s title references, cramming more components into a single integrated circuit is a really good idea.

The graph plots the logarithm base two of the number of components in an integrated circuit on the vertical axis against calendar years on the horizontal axis. Every notch upwards on the left doubles the number of components. So while 3 means 2^3 = 8 components, 13 means 2^{13} = 8,192 components. That is a thousand fold increase from 1962 to 1972.

There are two important things to note here.

The first is that he is talking about components on an integrated circuit, not just the number of transistors. Generally there are many more components than transistors, though the ratio did drop over time as different fundamental sorts of transistors were used. But in later years Moore’s Law was often turned into purely a count of transistors.

The other thing is that there are only four real data points here in this graph which he published in 1965. In 1959 the number of components is 2^0 = 1, i.e., that is not about an integrated circuit at all, just about single circuit elements–integrated circuits had not yet been invented. So this is a null data point.  Then he plots four actual data points, which we assume were taken from what Fairchild could produce, for 1962, 1963, 1964, and 1965, having 8, 16, 32, and 64 components. That is a doubling every year. It is an exponential increase in the true sense of exponential^{\big 1}.

What is the mechanism for this, how can this work? It works because it is in the digital domain, the domain of yes or no, the domain of 0 or 1.

In the last half page of the four and a half page article Moore explains the limitations of his prediction, saying that for some things, like energy storage, we will not see his predicted trend. Energy takes up a certain number of atoms and their electrons to store a given amount, so you can not just arbitrarily change the number of atoms and still store the same amount of energy. Likewise if you have a half gallon milk container you can not put a gallon of milk in it.

But the fundamental digital abstraction is yes or no. A circuit element in an integrated circuit just needs to know whether a previous element said yes or no, whether there is a voltage or current there or not. In the design phase one decides above how many volts or amps, or whatever, means yes, and below how many means no. And there needs to be a good separation between those numbers, a significant no mans land compared to the maximum and minimum possible. But, the magnitudes do not matter.

I like to think of it like piles of sand. Is there a pile of sand on the table or not? We might have a convention about how big a typical pile of sand is. But we can make it work if we halve the normal size of a pile of sand. We can still answer whether or not there is a pile of sand there using just half as many grains of sand in a pile.

And then we can halve the number again. And the digital abstraction of yes or no still works. And we can halve it again, and it still works. And again, and again, and again.

This is what drives Moore’s Law, which in its original form said that we could expect to double the number of components on an integrated circuit every year for 10 years, from 1965 to 1975.  That held up!

Variations of Moore’s Law followed; they were all about doubling, but sometimes doubling different things, and usually with slightly longer time constants for the doubling. The most popular versions were doubling of the number of transistors, doubling of the switching speed of those transistors (so a computer could run twice as fast), doubling of the amount of memory on a single chip, and doubling of the secondary memory of a computer–originally on mechanically spinning disks, but for the last five years in solid state flash memory. And there were many others.

Let’s get back to Moore’s original law for a moment. The components on an integrated circuit are laid out on a two dimensional wafer of silicon. So to double the number of components for the same amount of silicon you need to double the number of components per unit area. That means that the size of a component, in each linear dimension of the wafer needs to go down by a factor of \frac{1}{\sqrt{2}}. In turn, that means that Moore was seeing the linear dimension of each component go down to 71\% of what it was in a year, year over year.

But why was it limited to just a measly factor of two per year? Given the pile of sand analogy from above, why not just go to a quarter of the size of a pile of sand each year, or one sixteenth? It gets back to the yield one gets, the number of working integrated circuits, as you reduce the component size (most commonly called feature size). As the feature size gets smaller, the alignment of the projected patterns of light for each step of the process needs to get more accurate. Since \sqrt{2} = 1.41, approximately, it needs to get better by {{\sqrt{2}-1}\over{\sqrt{2}}}= 29\% as you halve the feature size. And because impurities in the materials that are printed on the circuit, the material from the gasses that are circulating and that are activated by light, the gas needs to get more pure, so that there are fewer bad atoms in each component, now half the area of before. Implicit in Moore’s Law, in its original form, was the idea that we could expect the production equipment to get better by about 29\% per year, for 10 years.

For various forms of Moore’s Law that came later, the time constant stretched out to 2 years, or even a little longer, for a doubling, but nevertheless the processing equipment has gotten that 29\% better time period over time period, again and again.

To see the magic of how this works, let’s just look at 25 doublings. The equipment has to operate with things \sqrt{2}^{25} times smaller, i.e., roughly 5,793 times smaller. But we can fit 2^{25} more components in a single circuit, which is 33,554,432 times more. The accuracy of our equipment has improved 5,793 times, but that has gotten a further acceleration of 5,793 on top of the original 5,793 times due to the linear to area impact. That is where the payoff of Moore’s Law has come from.

In his original paper Moore only dared project out, and only implicitly, that the equipment would get 29\% better every year for ten years. In reality, with somewhat slowing time constants, that has continued to happen for 50 years.

Now it is coming to an end. But not because the accuracy of the equipment needed to give good yields has stopped improving. No. Rather it is because those piles of sand we referred to above have gotten so small that they only contain a single metaphorical grain of sand. We can’t split the minimal quantum of a pile into two any more.

GORDON MOORE’S INCREDIBLE INSIGHT

Perhaps the most remarkable thing is Moore’s foresight into how this would have an incredible impact upon the world.  Here is the first sentence of his second paragraph:

Integrated circuits will lead to such wonders as home computers–or at least terminals connected to a central computer–automatic controls for automobiles, and personal portable communications equipment.

This was radical stuff in 1965. So called “mini computers” were still the size of a desk, and to be useful usually had a few peripherals such as tape units, card readers, or printers, that meant they would be hard to fit into a home kitchen of the day, even with the refrigerator, oven, and sink removed. Most people had never seen a computer and even fewer had interacted with one, and those who had, had mostly done it by dropping off a deck of punched cards, and a day later picking up a printout from what the computer had done when humans had fed the cards to the machine.

The electrical systems of cars were unbelievably simple by today’s standards, with perhaps half a dozen on off switches, and simple electromechanical devices to drive the turn indicators, windshield wipers, and the “distributor” which timed the firing of the spark plugs–every single function producing piece of mechanism in auto electronics was big enough to be seen with the naked eye.  And personal communications devices were rotary dial phones, one per household, firmly plugged into the wall at all time. Or handwritten letters than needed to be dropped into the mail box.

That sentence quoted above, given when it was made, is to me the bravest and most insightful prediction of technology future that we have ever seen.

By the way, the first computer made from integrated circuits was the guidance computer for the Apollo missions, one in the Command Module, and one in the Lunar Lander. The integrated circuits were made by Fairchild, Gordon Moore’s company. The first version had 4,100 integrated circuits, each implementing a single 3 input NOR gate. The more capable manned flight versions, which first flew in 1968, had only 2,800 integrated circuits, each implementing two 3 input NOR gates. Moore’s Law had its impact on getting to the Moon, even in the Law’s infancy.

A LITTLE ASIDE

In the original magazine article this cartoon appears:

At a fortieth anniversary of Moore’s Law at the Chemical Heritage Foundation^{\big 2} in Philadelphia I asked Dr. Moore whether this cartoon had been his idea. He replied that he had nothing to do with it, and it was just there in the magazine in the middle of his article, to his surprise.

Without any evidence at all on this, my guess is that the cartoonist was reacting somewhat skeptically to the sentence quoted above. The cartoon is set in a department store, as back then US department stores often had a “Notions” department, although this was not something of which I have any personal experience as they are long gone (and I first set foot in the US in 1977). It seems that notions is another word for haberdashery, i.e., pins, cotton, ribbons, and generally things used for sewing. As still today, there is also a Cosmetics department. And plop in the middle of them is the Handy Home Computers department, with the salesman holding a computer in his hand.

I am guessing that the cartoonist was making fun of this idea, trying to point out the ridiculousness of it. It all came to pass in only 25 years, including being sold in department stores. Not too far from the cosmetics department. But the notions departments had all disappeared. The cartoonist was right in the short term, but blew it in the slightly longer term^{\big 3}.

WHAT WAS THE IMPACT OF MOORE’S LAW?

There were many variations on Moore’s Law, not just his original about the number of components on a single chip.

Amongst the many there was a version of the law about how fast circuits could operate, as the smaller the transistors were the faster they could switch on and off. There were versions of the law for how much RAM memory, main memory for running computer programs, there would be and when. And there were versions of the law for how big and fast disk drives, for file storage, would be.

This tangle of versions of Moore’s Law had a big impact on how technology developed. I will discuss three modes of that impact; competition, coordination, and herd mentality in computer design.

Competition

Memory chips are where data and programs are stored as they are run on a computer. Moore’s Law applied to the number of bits of memory that a single chip could store, and a natural rhythm developed of that number of bits going up my a multiple of four on a regular but slightly slowing basis. By jumping over just a doubling, the cost of the silicon foundries could me depreciated over long enough time to keep things profitable (today a silicon foundry is about a $7B capital cost!), and furthermore it made sense to double the number of memory cells in each dimension to keep the designs balanced, again pointing to a step factor of four.

In the very early days of desktop PCs memory chips had 2^{14} = 16384 bits. The memory chips were called RAM (Random Access Memory–i.e., any location in memory took equally long to access, there were no slower of faster places), and a chip of this size was called a 16K chip, where K means not exactly 1,000, but instead 1,024 (which is 2^{10}). Many companies produced 16K RAM chips. But they all knew from Moore’s Law when the market would be expecting 64K RAM chips to appear. So they knew what they had to do to not get left behind, and they knew when they had to have samples ready for engineers designing new machines so that just as the machines came out their chips would be ready to be used having been designed in. And they could judge when it was worth getting just a little ahead of the competition at what price. Everyone knew the game (and in fact all came to a consensus agreement on when the Moore’s Law clock should slow down just a little), and they all competed on operational efficiency.

Coordination

Technology Review talks about this in their story on the end of Moore’s Law. If you were the designer of a new computer box for a desktop machine, or any other digital machine for that matter, you could look at when you planned to hit the market and know what amount of RAM memory would take up what board space because you knew how many bits per chip would be available at that time.  And you knew how much disk space would be available at what price and what physical volume (disks got smaller and smaller diameters just as they increased the total amount of storage). And you knew how fast the latest processor chip would run. And you knew what resolution display screen would be available at what price. So a couple of years ahead you could put all these numbers together and come up with what options and configurations would make sense by the exact time when you were going to bring your new computer to market.

The company that sold the computers might make one or two of the critical chips for their products but mostly they bought other components from other suppliers. The clockwork certainty of Moore’s Law let them design a new product without having horrible surprises disrupt their flow and plans. This really let the digital revolution proceed. Everything was orderly and predictable so there were fewer blind alleys to follow. We had probably the single most sustained continuous and predictable improvement in any technology over the history of mankind.

Herd mentality in computer design

But with this good came some things that might be viewed negatively (though I’m sure there are some who would argue that they were all unalloyed good).  I’ll take up one of these as the third thing to talk about that Moore’s Law had a major impact upon.

A particular form of general purpose computer design had arisen by the time that central processors could be put on a single chip (see the Intel 4004 below), and soon those processors on a chip, microprocessors as they came to be known, supported that general architecture.  That architecture is known as the von Neumann architecture.

A distinguishing feature of this architecture is that there is a large RAM memory which holds both instructions and data–made from the RAM chips we talked about above under coordination. The memory is organized into consecutive indexable (or addressable) locations, each containing the same number of binary bits, or digits. The microprocessor itself has a few specialized memory cells, known as registers, and an arithmetic unit that can do additions, multiplications, divisions (more recently), etc. One of those specialized registers is called the program counter (PC), and it holds an address in RAM for the current instruction. The CPU looks at the pattern of bits in that current instruction location and decodes them into what actions it should perform. That might be an action to fetch another location in RAM and put it into one of the specialized registers (this is called a LOAD), or to send the contents the other direction (STORE), or to take the contents of two of the specialized registers feed them to the arithmetic unit, and take their sum from the output of that unit and store it in another of the specialized registers. Then the central processing unit increments its PC and looks at the next consecutive addressable instruction. Some specialized instructions can alter the PC and make the machine go to some other part of the program and this is known as branching. For instance if one of the specialized registers is being used to count down how many elements of an array of consecutive values stored in RAM have been added together, right after the addition instruction there might be an instruction to decrement that counting register, and then branch back earlier in the program to do another LOAD and add if the counting register is still more than zero.

That’s pretty much all there is to most digital computers.  The rest is just hacks to make them go faster, while still looking essentially like this model. But note that the RAM is used in two ways by a von Neumann computer–to contain data for a program and to contain the program itself. We’ll come back to this point later.

With all the versions of Moore’s Law firmly operating in support of this basic model it became very hard to break out of it. The human brain certainly doesn’t work that way, so it seems that there could be powerful other ways to organize computation. But trying to change the basic organization was a dangerous thing to do, as the inexorable march of Moore’s Law based existing architecture was going to continue anyway. Trying something new would most probably set things back a few years. So brave big scale experiments like the Lisp Machine^{\big 4} or Connection Machine which both grew out of the MIT Artificial Intelligence Lab (and turned into at least three different companies) and Japan’s fifth generation computer project (which played with two unconventional ideas, data flow and logical inference) all failed, as before long the Moore’s Law doubling conventional computers overtook the advanced capabilities of the new machines, and software could better emulate the new ideas.

Most computer architects were locked into the conventional organizations of computers that had been around for decades. They competed on changing the coding of the instructions to make execution of programs slightly more efficient per square millimeter of silicon. They competed on strategies to cache copies of  larger and larger amounts of RAM memory right on the main processor chip. They competed on how to put multiple processors on a single chip and how to share the cached information from RAM across multiple processor units running at once on a single piece of silicon. And they competed on how to make the hardware more predictive of what future decisions would be in a running program so that they could precompute the right next computations before it was clear whether they would be needed or not. But, they were all locked in to fundamentally the same way of doing computation. Thirty years ago there were dozens of different detailed processor designs, but now they fall into only a small handful of families, the X86, the ARM, and the PowerPC. The X86’s are mostly desktops, laptops, and cloud servers. The ARM is what we find in phones and tablets.  And you probably have a PowerPC adjusting all the parameters of your car’s engine.

The one glaring exception to the lock in caused by Moore’s Law is that of Graphical Processing Units, or GPUs. These are different from von Neumann machines. Driven by wanting better video performance for video and graphics, and in particular gaming, the main processor getting better and better under Moore’s Law was just not enough to make real time rendering perform well as the underlying simulations got better and better. In this case a new sort of processor was developed. It was not particularly useful for general purpose computations but it was optimized very well to do additions and multiplications on streams of data which is what is needed to render something graphically on a screen. Here was a case where a new sort of chip got added into the Moore’s Law pool much later than conventional microprocessors, RAM, and disk. The new GPUs did not replace existing processors, but instead got added as partners where graphics rendering was needed. I mention GPUs here because it turns out that they are useful for another type of computation that has become very popular over the last three years, and that is being used as an argument that Moore’s Law is not over. I still think it is and will return to GPUs in the next section.

ARE WE SURE IT IS ENDING?

As I pointed out earlier we can not halve a pile of sand once we are down to piles that are only a single grain of sand. That is where we are now, we have gotten down to just about one grain piles of sand. Gordon Moore’s Law in its classical sense is over. See The Economist from March of last year for a typically thorough, accessible, and thoughtful report.

I earlier talked about the feature size of an integrated circuit and how with every doubling that size is divided by \sqrt{2}.  By 1971 Gordon Moore was at Intel, and they released their first microprocessor on a single chip, the 4004 with 2,300 transistors on 12 square millimeters of silicon, with a feature size of 10 micrometers, written 10μm. That means that the smallest distinguishable aspect of any component on the chip was 1/100th of a millimeter.

Since then the feature size has regularly been reduced by a factor of \frac{1}{\sqrt{2}}, or reduced to 71\% of its previous size, doubling the number of components in a given area, on a clockwork schedule.  The schedule clock has however slowed down. Back in the era of Moore’s original publication the clock period was a year.  Now it is a little over 2 years.  In the first quarter of 2017 we are expecting to see the first commercial chips in mass market products with a feature size of 10 nanometers, written 10nm. That is 1,000 times smaller than the feature size of 1971, or 20 applications of the 71\% rule over 46 years.  Sometimes the jump has been a little better than 71\%, and so we actually seen 17 jumps from 10μm down to 10nm. You can see them listed in Wikipedia. In 2012 the feature size was 22nm, in 2014 it was 14nm, now in the first quarter of 2017 we are about to see 10nm shipped to end users, and it is expected that we will see 7nm in 2019 or so. There are still active areas of research working on problems that are yet to be solved to make 7nm a reality, but industry is confident that it will happen. There are predictions of 5nm by 2021, but a year ago there was still much uncertainty over whether the engineering problems necessary to do this could be solved and whether they would be economically viable in any case.

Once you get down to 5nm features they are only about 20 silicon atoms wide.  If you go much below this the material starts to be dominated by quantum effects and classical physical properties really start to break down. That is what I mean by only one grain of sand left in the pile.

Today’s microprocessors have a few hundred square millimeters of silicon, and 5 to 10 billion transistors. They have a lot of extra circuitry these days to cache RAM, predict branches, etc., all to improve performance. But getting bigger comes with many costs as they get faster too. There is heat to be dissipated from all the energy used in switching so many signals in such a small amount of time, and the time for a signal to travel from one side of the chip to the other, ultimately limited by the speed of light (in reality, in copper it is about 5\% less), starts to be significant. The speed of light is approximately 300,000 kilometers per second, or 300,000,000,000 millimeters per second. So light, or a signal, can travel 30 millimeters (just over an inch, about the size of a very large chip today) in no less than one over 10,000,000,000 seconds, i.e., no less than one ten billionth of a second.

Today’s fastest processors have a clock speed of 8.760GigaHertz, which means by the time the signal is getting to the other side of the chip, the place if came from has moved on to the next thing to do. This makes synchronization across a single microprocessor something of a nightmare, and at best a designer can know ahead of time how late different signals from different parts of the processor will be, and try to design accordingly. So rather than push clock speed further (which is also hard) and rather than make a single microprocessor bigger with more transistors to do more stuff at every clock cycle, for the last few years we have seen large chips go to “multicore”, with two, four, or eight independent microprocessors on a single piece of silicon.

Multicore has preserved the “number of operations done per second” version of Moore’s Law, but at the cost of a simple program not being sped up by that amount–one cannot simply smear a single program across multiple processing units. For a laptop or a smart phone that is trying to do many things at once that doesn’t really matter, as there are usually enough different tasks that need to be done at once, that farming them out to different cores on the same chip leads to pretty full utilization.  But that will not hold, except for specialized computations, when the number of cores doubles a few more times.  The speed up starts to disappear as silicon is left idle because there just aren’t enough different things to do.

Despite the arguments that I presented a few paragraphs ago about why Moore’s Law is coming to a silicon end, many people argue that it is not, because we are finding ways around those constraints of small numbers of atoms by going to multicore and GPUs.  But I think that is changing the definitions too much.

Here is a recent chart that Steve Jurvetson, cofounder of the VC firm DFJ^{\big 5} (Draper Fisher Jurvetson), posted on his FaceBook page.  He said it is an update of an earlier chart compiled by Ray Kurzweil.

 

In this case the left axis is a logarithmically scaled count of the number of calculations per second per constant dollar. So this expresses how much cheaper computation has gotten over time. In the 1940’s there are specialized computers, such as the electromagnetic computers built to break codes at Bletchley Park. By the 1950’s they become general purpose, von Neuman style computers and stay that way until the last few points.

The last two points are both GPUs, the GTX 450 and the NVIDIA Titan X.  Steve doesn’t label the few points before that, but in every earlier version of a diagram that I can find on the Web (and there are plenty of them), the points beyond 2010 are all multicore.  First dual cores, and then quad cores, such as Intel’s quad core i7 (and I am typing these words on a 2.9MHz version of that chip, powering my laptop).

That GPUs are there and that people are excited about them is because besides graphics they happen to be very good at another very fashionable computation.  Deep learning, a form of something known originally as back propagation neural networks, has had a big technological impact recently. It is what has made speech recognition so fantastically better in the last three years that Apple’s Siri, Amazon’s Echo, and Google Home are useful and practical programs and devices. It has also made image labeling so much better than what we had five years ago, and there is much experimentation with using networks trained on lots of road scenes as part of situational awareness for self driving cars. For deep learning there is a training phase, usually done in the cloud, on millions of examples. That produces a few million numbers which represent the network that is learned. Then when it is time to recognize a word or label an image that input is fed into a program simulating the network by doing millions of multiplications and additions. Coincidentally GPUs just happen to perfect for the way these networks are structured, and so we can expect more and more of them to be built into our automobiles. Lucky break for GPU manufacturers! While GPUs can do lots of computations they don’t work well on just any problem. But they are great for deep learning networks and those are quickly becoming the flavor of the decade.

While rightly claiming that we continue to see exponential growth as in the chart above, exactly what is being measured has changed. That is a bit of a sleight of hand.

And I think that change will have big implications.

WHAT DOES THE END MEAN?

I think the end of Moore’s Law, as I have defined the end, will bring about a golden new era of computer architecture.  No longer will architects need to cower at the relentless improvements that they know others will get due to Moore’s Law. They will be able to take the time to try new ideas out in silicon, now safe in the knowledge that a conventional computer architecture will not be able to do the same thing in just two or four years in software. And the new things they do may not be about speed. They might be about making computation better in other ways.

Machine learning runtime

We are seeing this with GPUs as runtime engines for deep learning networks.  But we are also seeing some more specific architectures. For instance, for about a a year Google has had their own chips called TensorFlow Units (or TPUs) that save power for deep learning networks by effectively reducing the number of significant digits that are kept around as neural networks work quite well at low precision. Google has placed many of these chips in the computers in their server farms, or cloud, and are able to use learned networks in various search queries, at higher speed for lower electrical power consumption.

Special purpose silicon

Typical mobile phone chips now have four ARM processor cores on a single piece of silicon, plus some highly optimized special purpose processors on that same piece of silicon. The processors manage data flowing from cameras and optimizing speech quality, and even on some chips there is a special highly optimized processor for detecting human faces. That is used in the camera application, you’ve probably noticed little rectangular boxes around peoples’ faces as you are about to take a photograph, to decide what regions in an image should be most in focus and with the best exposure timing–the faces!

New general purpose approaches

We are already seeing the rise of special purpose architectures for very specific computations. But perhaps we will see more general purpose architectures but with a a different style of computation making a comeback.

Conceivably the dataflow and logic models of the Japanese fifth generation computer project might now be worth exploring again. But as we digitalize the world the cost of bad computer security will threaten our very existence. So perhaps if things work out, the unleashed computer architects can slowly start to dig us out of our current deplorable situation.

Secure computing

We all hear about cyber hackers breaking into computers, often half a world away, or sometimes now in a computer controlling the engine, and soon everything else, of a car as it drives by. How can this happen?

Cyber hackers are creative but many ways that they get into systems are fundamentally through common programming errors in programs built on top of the von Neumann architectures we talked about before.

A common case is exploiting something known as “buffer overrun”. A fixed size piece of memory is reserved to hold, say, the web address that one can type into a browser, or the Google query box. If all programmers wrote very careful code and someone typed in way too many characters those past the limit would not get stored in RAM at all. But all too often a programmer has used a coding trick that is simple, and quick to produce, that does not check for overrun and the typed characters get put into memory way past the end of the buffer, perhaps overwriting some code that the program might jump to later. This relies on the feature of von Neumann architectures that data and programs are stored in the same memory. So, if the hacker chooses some characters whose binary codes correspond to instructions that do something malicious to the computer, say setting up an account for them with a particular password, then later as if by magic the hacker will have a remotely accessible account on the computer, just as many other human and program services may. Programmers shouldn’t oughta make this mistake but history shows that it happens again and again.

Another common way in is that in modern web services sometimes the browser on a lap top, tablet, or smart phone, and the computers in the cloud need to pass really complex things between them. Rather than the programmer having to know in advance all those complex possible things and handle messages for them, it is set up so that one or both sides can pass little bits of source code of programs back and forth and execute them on the other computer. In this way capabilities that were never originally conceived of can start working later on in an existing system without having to update the applications. It is impossible to be sure that a piece of code won’t do certain things, so if the programmer decided to give a fully general capability through this mechanism there is no way for the receiving machine to know ahead of time that the code is safe and won’t do something malicious (this is a generalization of the halting problem — I could go on and on… but I won’t here). So sometimes a cyber hacker can exploit this weakness and send a little bit of malicious code directly to some service that accepts code.

Beyond that cyber hackers are always coming up with new inventive ways in–these have just been two examples to illustrate a couple of ways of how it is currently done.

It is possible to write code that protects against many of these problems, but code writing is still a very human activity, and there are just too many human-created holes that can leak, from too many code writers. One way to combat this is to have extra silicon that hides some of the low level possibilities of a von Neumann architecture from programmers, by only giving the instructions in memory a more limited set of possible actions.

This is not a new idea. Most microprocessors have some version of “protection rings” which let more and more untrusted code only have access to more and more limited areas of memory, even if they try to access it with normal instructions. This idea has been around a long time but it has suffered from not having a standard way to use or implement it, so most software, in an attempt to be able to run on most machines, usually only specifies two or at most three rings of protection. That is a very coarse tool and lets too much through.  Perhaps now the idea will be thought about more seriously in an attempt to get better security when just making things faster is no longer practical.

Another idea, that has mostly only been implemented in software, with perhaps one or two exceptions, is called capability based security, through capability based addressing. Programs are not given direct access to regions of memory they need to use, but instead are given unforgeable cryptographically sound reference handles, along with a defined subset of things they are allowed to do with the memory. Hardware architects might now have the time to push through on making this approach completely enforceable, getting it right once in hardware so that mere human programmers pushed to get new software out on a promised release date can not screw things up.

From one point of view the Lisp Machines that I talked about earlier were built on a very specific and limited version of a capability based architecture. Underneath it all, those machines were von Neumann machines, but the instructions they could execute were deliberately limited. Through the use of something called “typed pointers”, at the hardware level, every reference to every piece of memory came with restrictions on what instructions could do with that memory, based on the type encoded in the pointer. And memory could only be referenced by a pointer to the start of a chunk of memory of a fixed size at the time the memory was reserved. So in the buffer overrun case, a buffer for a string of characters would not allow data to be written to or read from beyond the end of it. And instructions could only be referenced from another type of pointer, a code pointer. The hardware kept the general purpose memory partitioned at a very fine grain by the type of pointers granted to it when reserved. And to a first approximation the type of a pointer could never be changed, nor could the actual address in RAM be seen by any instructions that had access to a pointer.

There have been ideas out there for a long time on how to improve security through this use of hardware restrictions on the general purpose von Neumann architecture.  I have talked about a few of them here. Now I think we can expect this to become a much more compelling place for hardware architects to spend their time, as security of our computational systems becomes a major achilles heel on the smooth running of our businesses, our lives, and our society.

Quantum computers

Quantum computers are a largely experimental and very expensive at this time technology. With the need to cool them to physics experiment level ultra cold, and the expense that entails, to the confusion over how much speed up they might give over conventional silicon based computers and for what class of problem, they are a large investment, high risk research topic at this time. I won’t go into all the arguments (I haven’t read them all, and frankly I do not have the expertise that would make me confident in any opinion I might form) but Scott Aaronson’s blog on computational complexity and quantum computation is probably the best source for those interested. Claims on speedups either achieved or hoped to be achieved on practical problems range from a factor of 1 to thousands (and I might have that upper bound wrong). In the old days just waiting 10 or 20 years would let Moore’s Law get you there. Instead we have seen well over a decade of sustained investment in a technology that people are still arguing over whether it can ever work. To me this is yet more evidence that the end of Moore’s Law is encouraging new investment and new explorations.

Unimaginable stuff

Even with these various innovations around, triggered by the end of Moore’s Law, the best things we might see may not yet be in the common consciousness. I think the freedom to innovate, without the overhang of Moore’s Law, the freedom to take time to investigate curious corners, may well lead to a new garden of Eden in computational models. Five to ten years from now we may see a completely new form of computer arrangement, in traditional silicon (not quantum), that is doing things and doing them faster than we can today imagine. And with a further thirty years of development those chips might be doing things that would today be indistinguishable from magic, just as today’s smart phone would have seemed like utter magic to 50 year ago me.



FOOTNOTES

^{\big 1}Many times the popular press, or people who should know better, refer to something that is increasing a lot as exponential.  Something is only truly exponential if there is a constant ratio in size between any two points in time separated by the same amount.  Here the ratio is 2, for any two points a year apart.  The misuse of the term exponential growth is widespread and makes me cranky.

^{\big 2}Why the Chemical Heritage Foundation for this celebration? Both of Gordon Moore’s degrees (BS and PhD) were in physical chemistry!

^{\big 3}For those who read my first blog, once again see Roy Amara‘s Law.

^{\big 4}I had been a post-doc at the MIT AI Lab and loved using Lisp Machines there, but when I left and joined the faculty at Stanford in 1983 I realized that the more conventional SUN workstations being developed there and at spin-off company Sun Microsystems would win out in performance very quickly. So I built a software based Lisp system (which I called TAIL (Toy AI Language) in a nod to the naming conventions of most software at the Stanford Artificial Intelligence Lab, e.g., BAIL, FAIL, SAIL, MAIL)  that ran on the early Sun workstations, which themselves used completely generic microprocessors. By mid 1984 Richard Gabriel, I, and others had started a company called Lucid in Palo Alto to compete on conventional machines with the Lisp Machine companies. We used my Lisp compiler as a stop gap, but as is often the case with software, that was still the compiler used by Lucid eight years later when it ran on 19 different makes of machines. I had moved back to MIT to join the faculty in late 1984, and eventually became the director of the Artificial Intelligence Lab there (and then CSAIL). But for eight years, while teaching computer science and developing robots by day, I also at night developed and maintained my original compiler as the work horse of Lucid Lisp. Just as the Lisp Machine companies got swept away so too eventually did Lucid. Whereas the Lisp Machine companies got swept away by Moore’s Law, Lucid got swept away as the fashion in computer languages shifted to a winner take all world, for many years, of C.

^{\big 5}Full disclosure. DFJ is one of the VC’s who have invested in my company Rethink Robotics.