Rodney Brooks

Robots, AI, and other stuff

[FoR&AI] Steps Toward Super Intelligence II, Beyond the Turing Test

[This is the second part of a four part essay–here is Part I.]

As we (eventually…) start to plot out how to build Artificial General Intelligence there is going to be a bifurcation in the path ahead.  Some will say that we must choose which direction to take. I argue that the question is complex and it may be felicitous to make a Shrödingerian compromise.

I get people making two arguments to me about analogies between AGI and heavier than air flight.  And the arguments are made to me with about equal frequency. As I write these sentences I can recall at least one of the two arguments being made to me in each of the last six weeks.

One argument is that we should not need to take into account how humans come to be intelligent, nor try to emulate them, as heavier than air flight does not emulate birds. That is only partially true as there were multiple influences on the Wright brothers from bird flight1. Certainly today the appearance of heavier than air flight is very different from that of birds or insects, though the continued study of the flight of those creatures continues to inform airplane design. This is why over the last twenty years or so jet aircraft have sprouted winglets at the ends of primary wings.

Airplanes can fly us faster and further than something that more resembled birds would. On the other hand our airplanes have not solved the problem of personal flight. We can no more fly up from the ground and perch in a tall tree than we could before the Wright brothers. And we are not able to take off and land wherever we want without large and extremely noisy machines. A little more bird would not be all bad.

I accept the point that to build a human level intelligence it may well not need to be much at all like humans in how it achieves that. However, for now at least, it is the only model we have and there is most likely a lot to learn still from studying how it is that people are intelligent. Furthermore, as we will see below, having a lot of commonality between humans and intelligent agents will let them be much more understandable partners.

This is the compromise that I am willing to make. I am willing to believe that we do not need to do everything like humans do, but I am also convinced that we can learn a lot from humans and human intelligence.

The second argument that I get is largely trying to cast me as a grumpy old guy, and that may well be fair!

People like to point out that very late in the nineteenth century, less than a decade before the Wright brothers’ first flight, Lord Kelvin and many others had declared that heavier than air flight was impossible. Kelvin, and others, could not have meant that literally as they knew that birds and insects were both heavier than air and could fly. So they were clearly referring to something more specific. If they were talking about human powered heavier than air flight, they were sort of right as it was more than sixty years away2, and then only for super trained athletes.

More likely, they were discouraged by the failures of so many attempts with what had seemed like enough aerodynamic performance and engine power to fly. What they were missing understanding was that it was the control of flight which needed to be developed, and that is what made the Wright brothers successful.

So… are we, by analogy, just one change in perspective away from getting to AGI? And really, is deep learning that change in perspective?  And I don’t know that it is already solved, and the train is accelerating down the track?

I, old(ish) guy for sure, grumpy or not, think that the analogy breaks down. I think it is not just an analogy for flight of “control” that we are missing, but that in fact there are at least a hundred such things we are currently missing. Intelligence is a much bigger deal than flight. A bird brain does not cut it.

So now, with the caveat of the second argument perhaps identifying a personal failing, I want to proclaim that there is no need to come down on one side or the other, airplane vs bird, doing it a completely different way AI vs emulating how a human does it.

Instead I think we should use the strengths wherever we can find them, in engineering, or in biology. BUT, if we do not make our AI systems understandable to us, if they can not relate to the quirks that make us human, and if they are so foreign that we can not treat them with empathy, then they will not be adopted into our lives.

Underneath the shiny exterior they may be not be flesh and blood, but instead may be silicon chips and wires, and yes, even the occasional deep learning network, but if they are to be successful in our world we are going to have to like them.


Just what a AGI agent or robot must be able to do to qualify as being generally intelligent is I think a murky question. And my goal of endowing them with human level intelligence is even murkier to define. To try to give it a little coherence I am going to choose two specific applications for AGI agents, and then discuss what this means in terms of research that awaits in order to achieve that goal. I could just as well have chosen different applications for them, but these two will make things concrete. And since we are talking about Artificial GENERAL Intelligence, I will push these application agents to be as good as one could expect a person to be in similar circumstances.

The first AGI agent will be a physically embodied robot that works in a person’s home providing them care as they age. I am not talking about companionship but rather physical assistance that will enable someone to live with dignity and independence as they age in place in their own home. For brevity’s sake we will refer to this robot, an eldercare worker, as ECW for the rest of this post.

ECW may come with lots of pre-knowledge about the general task of helping an elderly person in their home, and a lot of fundamental knowledge of what a home is, the sorts of things that will be found there, and all sorts of knowledge about people, both the elderly in general, and a good range of what to expect of family members and family dynamics, along with all sorts of knowledge about the sorts of people that might come into the home be it to deliver goods, or to carry out maintenance on the home. But ECW will also need to quickly adapt to the peculiarities of the particular elderly person and their extended social universe, the particular details of the house, and be able to adapt over time as the person ages.

The second AGI agent we will consider need not be embodied for the purposes of this discussion.  It will be an agent that can be assigned a complex planning task that will involve a workplace, humans as workers, machines that have critical roles, and humans as customers to get services at this workplace. For instance, and this is the example we will work through here, it might one particular day be asked to plan out all the details of a dialysis ward, specifying the physical layout, the machines that will be needed, the skillsets and the work flow for the people or robots to be employed there, and to consider how to make the experience for patients one that they will rank highly. We will refer to this services logistics planner as SLP.

Again SLP may come to this task with lots of prior knowledge about all sorts of things in the world, but there will be peculiarities about the particular hospital building, its geographical location and connection to transportation networks, the insurance profiles of the expected patient cohort, and dozens of other details. Although there have been many dialysis wards already designed throughout the world, for the purpose of this blog post we are going to assume that SLP has to design the very first one. Thus a dialysis ward, as used here, is a proxy for some task entirely new to humanity. This sort of logistical planning is not at all uncommon, and senior military officers, below the flag level, can assume that they will be handed assignments of this magnitude in times of unexpected conflicts. If we are going to be able to build an AGI agent that can work at human level, we surely should expect it to be able to handle this task.

We will see that both these vocations are quite complex and require much subtlety in order to be successful. I believe that the same is true of most human occupations, so I could equally well have chosen other AGI based agents, and the arguments below would be very similar.

My goal here is to replace the so-called “Turing Test” with something which test the intelligence of a system much more broadly. Stupid chat bots have managed to do well at the Turing Test. Being an effective ECW or SLP will be a much more solid test, and will guarantee a more general competence than what is required for the Turing Test.

Some who read this might argue that all the detailed problems and areas crying out for research that I give below will be irrelevant for a “Super Intelligence”, as it will do things its own way and that it will be beyond our understanding.  But such an argument would be falling into one of the traps that I identified in my blog post about seven common forms of mistake that are made in predicting the future of Artificial Intelligence. In this case it is the mistake of attributing magic to a sufficiently advanced technology. Super Intelligence is so super that we can not begin to understand it so there is no point making any rational arguments about it. Well, I am not prepared to sit back and idly wait for magic to appear. I prefer to be actively working to make magic happen, and that is what this post is about.

Oh, and one last meta-point on this rhetorical device that I have chosen to make my arguments. Yes, both EW and SLP will have plenty of opportunity to kill people should they want. I made sure of that so that the insatiable cravings of the Super Intelligence alarmists will have a place for expression concerning the wickedness that is surely on its way.


Let us suppose that the Elder Care Worker (ECW) robot is permanently assigned to work in a house with an older man, named Rodney. We will talk about the interactions between ECW and Rodney, and how Rodney may change over time as he gets older.

There are some basic things that almost go without saying. ECW should not harm Rodney, or guests at his house. It should protect Rodney from all things bad, should do what Rodney asks it to do, unless in some circumstances it should not (see below), etc.

One might think that Asimov’s three laws will cover this base necessity. In a 1942 short story titled “Runaround” the science fiction writer Isaac Asimov stated them as:

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Then in subsequent stories the intricate details of how these rules might be in conflict, or have inherent ambiguity in how they should be interpreted, became a vehicle for new story after new story for Asimov.

It turns out that no robot today has a remote chance of successfully obeying these rules, for reasons of the necessary perception being difficult, using common sense, and predicting how people are going to behave being beyond current AI capabilities. We will talk at more length about these challenges in Part III of this essay.

But somehow ECW should have a code of conduct concerning the sorts of things it should do and should not do. That might be explicitly represented and available for introspection by ECW, or it may be built in due to inherent limitations in the way ECW is built. For instance any dishwashing machine that you can buy today just inherently can not wander into the living room during a wash cycle and pour out its water on the floor!

But ECW does need to be a lot more proactive than a dishwasher. To do so it must know who Rodney is, have access to relevant medical data about Rodney and be able to use that information in all sorts of decisions it must make on a daily basis. At the very least it must be able to recognize Rodney when he comes home, and track his appearance as he ages. ECW needs to know who is Rodney, who are family members, and needs to track who is in the house and when. It also needs to know the difference between a family member and a non family member, and individually know each of those family members. It needs to know who is a friend, who is just an acquaintance, who is a one time visitor, who is a regular visitor but a service provider of some sort, when the person at the door is a delivery person, and when the person at the door is someone who should not be allowed in.

As I pointed out in my post titled What Is It Like to Be a Robot? our domestic robots in the coming decades will be able to have a much richer set of sensory inputs than do we humans. They will be able to detect the presence of humans in all sorts of ways that we can not, but they will still need to be very good at quickly recognizing who is who, and it will most likely need to be done visually. But it won’t do to demand a full frontal face on view. Instead they will, like a person, need to piece together little pieces of information quickly, and make inferences. For example, if some person leaves a room to go to another room full of people, and a minute later a person comes from that room, with similar clothes on, perhaps that will be enough of a clue. Without knowing who is who the ECW could get quite annoying. And it should not be annoying. Especially not to Rodney, who will have to live with the robot for ten or twenty years.

Here is an example of an annoying machine. I have had an account with a particular bank for over thirty years and I have used its ATM card to get cash for that whole time. Every single time I insert my card it asks me whether I won’t so communicate in Spanish today. I have never said yes. I will never say yes. But it asks me anew every single interaction.

ECW should not be that sort of annoying with any of the people that it interacts with. It will need to understand what sorts of things are constant about a person, what sorts of things change often about a person, and what sort of changes in circumstances might precipitate a previously unlikely way of interacting. On the other hand, if a known service person shows up after months of absence, and ECW did not summon that person, it would probably be reasonable of ECW to ask why they are there. A human care giver knows all these sorts of social interaction things about people. ECW will need to also, or otherwise it will fall into the same class as an annoying ATM.

ECW will need to model family dynamics in order to not be annoying and to not make any sort of social faux pas. As Rodney deteriorates it may need to step in to the social dynamic to smooth things out, as a good human caretaker would. ECW might need to whisper to Rodney’s daughter when she arrives some day that Rodney seems a little cranky today, and then explain why–perhaps a bad reaction to some regular medicine, or perhaps he has been raving about someone claiming that one of his 25 year old blog predictions about future technology was overly pessimistic and he doesn’t buy it, but sure is upset about it. Other details may not matter at all. ECW will need to give the appropriate amount of information to the appropriate person.

In order to do this ECW will need to have a model of who knows what, who should know what, how they might already have the right information, or how their information may be wrong or out of date.

Rodney will likely change over ECW’s tenure, getting frailer, and needing more help from ECW. ECW will need to adjust its services to match Rodney’s changing state. That may include changing who it listens to primarily for instructions.

While the adults around will stay adults Rodney’s grandchild may change from being a helpless baby to a being a college graduate or even a medical doctor. If ECW is going to be as seamless as a person would be in accommodating those changes in its relationship with the grandchild over time then it is going to need to understand a lot about children and their levels of competence and how it should change its interaction. A college graduate is not going to appreciate being interacted with as though a baby.

But what does ECW need to actually do?

As Rodney declines over time, ECW will need to take over more and more responsibility for the normal aspects of living. It will need to clean up the house, picking up things and putting them away. It will need to distinguish things that are on the floor, understanding that disposing of a used tissue is different from dealing with a sock found on the floor.

It will need to start reaching for the things that are stored high up in the kitchen and hand them to Rodney. It may need to start cooking meals and be able to judge what Rodney should be eating to stay healthy.

When a young child shows something to an adult they know to use motion cues to draw the adult’s attention to the right thing. They know to put it in a place where the adult’s line of sight will naturally intersect. They know to glance at the adult’s eyes to see where their attention lies, and they know to use words to draw the adult’s attention when the person has not yet focused on the object being shown.

In order to do this well, even with all its super senses, ECW will still need to be able to “imagine” how the human sees the world so that it can ascertain what cues will most help the human understand what it is trying to convey.

When ECW first starts to support Rodney in his home Rodney may well be able to speak as clearly to ECW as he does to an Amazon Echo or a Google Home, but over time his utterances will become less well organized. ECW will need to use all sorts of context and something akin to reasoning in order to make sense of what Rodney is trying to convey. It likely will be very Rodney-dependent and likely not something that can be learned from a general purpose large data set gathered across many elderly people.

As Rodney starts to have trouble with noun retrieval ECW will need to follow the convoluted ways that Rodney tries to get around missing words when he is trying to convey information to ECW. Even a phrase as simple sounding as “the red thing over there” may be complex for ECW to understand. Current Deep Learning vision systems are terrible at color constancy (we will talk about that in the what is hard section next). Color constancy is something that is critical to human based ontologies of the world, the formal naming systems any group that shares a spoken language assumes that all other speakers will understand. It turns out that “red” is actually quite complex in varied lighting conditions–humans handle it just fine, but it is one of many tens of sub-capabilties that we all have without even realizing it.

ECW will have to take turns with Rodney on some tasks, encouraging him to do things for himself as he gets older–it will be important for Rodney to remain active, and ECW will have to judge when pushing Rodney to do things is therapeutic  and when it is unsupportive.

Eventually ECW will need to help Rodney doing all the things that happen in bathrooms. At some point it will need to start going into the bathroom with him, observing whether he is unsteady or not and provide verbal coaching. Over time ECW will need to stick closer to Rodney in the bathroom providing physical support as needed. Eventually it may need to help him get on to and off of the oval office, perhaps eventually providing wiping assistance. Even before this, ECW may need to help Rodney get into and out of bed–once a person loses the ability to do that on their own they often need to leave their home and go into managed care. ECW will be able to stave off that day, perhaps for years, just by helping with this twice daily task.

Coming into contact with, and supplying support and even lifting a frail and unsteady human body, easily damaged, and under control of a perhaps not quite rational degraded human intelligence is a complex physical interaction. It can often be eased by verbal communication between the two participants. Over the years straightforward language communication will get more and more difficult in both directions, as split second decisions will need to be made just at the physical level alone on what motions are appropriate, effective, and non-threatening.

As ECW comes into contact with Rodney there will be more opportunities for diagnostics. A human caregiver helping Rodney out of bed would certainly notice if his pajamas were wet. Or if some worse accident had happened. Now we get to an ethical issue. Suppose Rodney notices that ECW noticed, and says, “please don’t tell my children/doctor”. Under what conditions should ECW honor Rodney’s request, and when will it be in Rodney’s better interest to disobey Rodney and violate his privacy?

Early versions of ECW, before such robots are truly intelligent, will most likely rely on a baked in set of rules which require only straightforward perceptual inputs in order make these decisions–they will appear rigid and sometimes inhuman. When we get ECW to the level of intelligence of a person we will all expect it to make such decisions in much more nuanced ways, and be able to explain how it weighed the competing factors arguing for the two possible outcomes–tell Rodney’s children or not.

All along ECW will need to manage its own resources, including its battery charge, its maintenance schedule, its software upgrades, its network access, its cloud accounts, and perhaps figuring out how to pay its support bills if Rodney’s finances are in disarray. There will be a plethora of ECW’s other cyber physical needs that will need to be met while working around Rodney’s own schedule. ECW will have to worry about its own ongoing existence and health and weigh that against its primary mission of providing support to Rodney.


The Services Logistics Planner (SLP) does not need to be embodied, but it will need to at least appear to be cerebral, even as it lives entirely in the cloud. It will need to be well grounded in what it is to be human, and what the human world is really like if it is to be able to do its task.

The clients, the people who want the facility planned, will communicate with SLP through speech as we all do with the Amazon Echo or Google Home, and through sending it documents (Word, Powerpoint, Excel, pdf’s, scanned images, movie files, etc.). SLP will reply with build out specifications, organizational instructions, lists of equipment that will be needed and where it will be placed, staffing needs, consumable analysis, and analysis of expected human behaviors within the planned facility. And then the client and other interested parties will engage in an interactive dialog with SLP,  asking for the rationale for various features, suggesting changes and modifying their requirements. If SLP is as good as an intelligent person then this dialog will appear natural and have give and take3.

Consider the task of designing a new dialysis ward for an existing hospital.

Most likely SLP will be given a fixed space within the floor plan, and that will determine some aspects of people flow in and out, utility conduits, etc.  SLP will need to design any layout changes of walls within the space, and submit petitions to the right hospital organizations to change entrances and exits and how they will affect people flow in other parts of the hospital. SLP will need to decide what equipment will be where in the newly laid out space, what staffing requirements there will be, what hours the ward should be accepting out patients, what flows there should be for patients who end up having problems during a visit and need to get transferred to other wards in the hospital, and the layout of the beds and chairs for dialysis (and even which should be used). SLP will need to decide, perhaps through consulting other policy documents, how many non-patients will be allowed to accompany a patient, where they will be allowed to sit while the patient is on dialysis, and what the layout of the waiting room will be.

SLP will need to consider boredom for people in the waiting areas. But it won’t be told this in the specification. It will have to know enough about people to consider the different sorts of boredom for patients and supporting friends and relatives, access to snacks (appropriate for support visitors and for pre and post dialysis patients), access to information on when someone’s turn will come, and when someone will be getting out of active dialysis, etc., etc. At some level this will require an understanding of humans, their fears, their expected range of emotions, and how people will react to stressful situations.

It will have to worry about the width of corridors, the way doors open, and the steps or slopes within the facility. It will need to consider these things from many different angles; how to get new equipment in and out, how the cleaning staff will work, what the access routes will be in emergencies for doctors and nurses, how patients will move around, how visitors will be isolated to appropriate areas without them feeling like they are locked in, etc., etc.

SLP will need to worry about break rooms, the design of check in facilities, the privacy that patients can and should expect, and how to make it so that the facility itself is calming, rather than a source of stress, for the staff, the patients, and the visitors.

Outside of the ward itself, SLP will need to look in to what the profile of the expected patients are for this ward, and how they will travel to and from the hospital, for this particular city, for their out patient visit. It will need to predict the likelihood of really bad travel days where multiple patients get unexpectedly delayed. It will need to determine both a policy that tries to push everyone to be on time, but also have a back up system in place as it will not be acceptable to tell such patients that it is just too bad that they missed this appointment through no fault of their own–this is a matter of life and death, and SLP will need to analyze what are the acceptable back up delays and risks to patients.

SLP will not be told any of these things when given the task to design the new dialysis ward. It will need to be smart enough about people to know that these will all be important issues. And it will need to be able to push back on the really important aspects during the “value engineering” (i.e., reduction in program) that the humans reviewing the task will be promoting. And remember that for the purposes of this demonstration of human level general intelligence we are going to assume that this is the very first dialysis ward ever designed.

There is a lot of knowledge that SLP will need to access. And a lot of things that will need to feed into every decision, and a lot of tradeoffs that will need to be made, as surely there will need to be many compromises to be made as in any human endeavor.

Most importantly SLP will need to be able to explain itself. An insurance company that is bidding on providing insurance services for the facility that SLP has just designed will want to ask specific questions about what considerations went into certain aspects of the design. Representatives from the company will push for models of all sorts of aspects of the proposed design in terms of throughput considerations, risks to people quite apart from their illness and their dialysis requirements, how it was determined that all aspects of the design meet fire code, what sort of safety certifications exist for the chosen materials, etc., etc., etc.

But “Wait!” the Deep Learning fan boy says. “It does not need to do all those humanny sorts of things. We’ll just show it thousands of examples of facilities throughout the world and it will learn! Then it will design a perfect system. It is not up to us mere humans to question the magic skill of the Super Intelligent AI.”

But that is precisely the point of this example. For humans to be happy with a system designed by SLP it must get details incredibly correct. And by making this the first dialysis ward ever designed  it means that there just will not be much in the way of data on which to train it. If something really is Super it will have to handle tasks à nouveau, since people have been doing that throughout history.

The Two New Tests

I said about that I was proposing these two test cases, ECW and SLP, as a replacement for the Turing Test. Some may be disappointed that I did not give a simple metric on them.

In other tests there is a metric. In the Turing Test it is what percentage of human judges it fools into making a wrong binary decision. In the ever popular robot soccer competitions it is which team wins. In the DARPA Grand Challenge it was how long it took an autonomous vehicle to finish the course.

ECW and SLP have much more nuanced tasks. Just as there is no single multiple choice test that one must pass to receive a PhD, we will only know if ECW or SLP are doing a good job by continuously challenging them and evaluating them over many years.

Welcome to the real world. That is where we want our Artificial General Intelligences and our Super Intelligences to live and work.

Next up: Part III, some things that are hard for AI today.

1 The Wright Brothers were very much inspired by the gliding experiments of Otto Lilienthal. He eventually died in an accident while gliding in 1896 after completing more than 2,000 flights in the previous five years in gliders of his own design. Lilienthal definitely studied and was inspired by birds. He had published a book in 1889 titled Der Vogelflug als Grundlage der Fliegekunst, which translates to Birdflight as the Basis of Aviation. According to Wikipedia. James Tobin, on page 70 of his 2004 book To Conquer The Air: The Wright Brothers and the Great Race for Flight, says something to the effect of “[o]n the basis of observation, Wilbur concluded that birds changed the angle of the ends of their wings to make their bodies roll right or left”.

2 The first human powered heavier than air flight had to wait until 1961 in Southhampton, UK, and it was not until 1977 that the human powered heavier than air Gossamer Condor flew for a mile in a figure eight, showing both duration and controllability. In 1979 the Gossamer Albatross flew 22 miles across the English Channel, powered by Bryan Allen flying at an average height of 5 feet. And in 1988 MIT’s Daedalus powered by Kanellos Kannellopoulous flew a still record 72 miles from Crete to Santorini in a replay of ancient Greek mythology.

3 Humans are quite used to having conversations involving give and take where the two participants are not equals–parent and three year old, teacher and student, etc. Each can respect the other and be open to insights from the “lower ranked” individual while being respectful and each side letting the other feel like they belong in the conversation. Even when we get to Super Intelligence there is no reason to think we will immediately go to no respect for the humans. And if our earlier not quite Super systems start to get that way we will change them.

One comment on “[FoR&AI] Steps Toward Super Intelligence II, Beyond the Turing Test”

  1. Humans can design things that have never been done before, but they often don’t do it very well. Then (hopefully) they learn from this first attempt and improve it over time. For your SLP to do the design well on its first try, taking into account all the factors that you stated, I would consider it a Super Intelligence (and a very useful one!). Humans are far more adept at following existing guidelines (“best practice”) that have been refined over many instances, than at creating something this complicated from first principles (see Elon Musk’s car manufacturing as an example).

Comment on this

Your email address will not be published. Required fields are marked *