Blog

Predictions Scorecard, 2020 January 01

rodneybrooks.com/predictions-scorecard-2020-january-01/

On January 1st, 2018, I made predictions (here) about self driving cars, Artificial Intelligence and machine learning, and about progress in the space industry. Those predictions had dates attached to them for 32 years up through January 1st, 2050.

I made my predictions because at the time I saw an immense amount of hype about these three topics, and the general press and public drawing conclusions about all sorts of things they feared (e.g., truck driving jobs about to disappear, all manual labor of humans about to disappear) or desired (e.g., safe roads about to come into existence, a safe haven for humans on Mars about to start developing) being imminent. My predictions, with dates attached to them, were meant to slow down those expectations, and inject some reality into what I saw as irrational exuberance.

As part of self certifying the seriousness of my predictions I promised to review them, as made on January 1st, 2018, every following January 1st for 32 years, the span of the predictions, to see how accurate they were.

On January 1st, 2019, I posted my first annual self appraisal of how well I did. This post, today, January 1st, 2020, is my second annual self appraisal of how well I did–I have 30 more annual appraisals ahead of me. I think in the two years since my predictions, there has been a general acceptance that certain things are not as imminent or as inevitable as the majority believed just then. So some of my predictions now look more like “of course”, rather than “really, that long in the future?” as they did then.

This is a boring update. Despite lots of hoopla in the press about self driving cars, Artificial Intelligence and machine learning, and the space industry, this last year, 2019, was not actually a year of big milestones. Not much that will matter in the long run actually happened in 2019.

Furthermore, this year’s summary indicates that so far none of my predictions have turned out to be too pessimistic. Overall I am getting worried that I was perhaps too optimistic, and had bought into the hype too much. There is only one dated prediction of mine that I am currently worried may have been too pessimistic–I won’t name it here as perhaps I will turn out to be right after all.

Repeat of Last Year’s Explanation of Annotations

As I said last year, I am not going to edit my original post, linked above, at all, even though I see there are a few typos still lurking in it. Instead I have copied the three tables of predictions below from last year’s update post, and have simply added a total of six comments to the fourth column. As with last year I have highlighted dates in column two where the time they refer to has arrived.

I tag each comment in the fourth column with a cyan colored date tag in the form yyyymmdd such as 20190603 for June 3rd, 2019.

The entries that I put in the second column of each table, titled “Date” in each case, back on January 1st of 2018, have the following forms:

NIML meaning “Not In My Lifetime, i.e., not until beyond December 31st, 2049, the last day of the first half of the 21st century.

NET some date, meaning “No Earlier Than” that date.

BY some date, meaning “By” that date.

Sometimes I gave both a NET and a BY for a single prediction, establishing a window in which I believe it will happen.

For now I am coloring those statements when it can be determined already whether I was correct or not.

I have started using LawnGreen (#7cfc00) for those predictions which were entirely accurate. For instance a BY 2018 can be colored green if the predicted thing did happen in 2018, as can a NET 2019 if it did not happen in 2018 or earlier. There are five predictions now colored green, the same ones as last year, with no new ones in January 2020.

I will color dates Tomato (#ff6347) if I was too pessimistic about them. No Tomato dates yet. But if something happens that I said NIML, for instance, then it would go Tomato, or if in 2020 something already had happened that I said NET 2021, then that too would have gone Tomato.

If I was too optimistic about something, e.g., if I had said BY 2018, and it hadn’t yet happened, then I would color it DeepSkyBlue (#00bfff). None of these yet either. And eventually if there are NETs that went green, but years later have still not come to pass I may start coloring them LightSkyBlue (#87cefa).

In summary then: Green splashes mean I got things exactly right. Red means provably wrong and that I was too pessimistic. And blueness will mean that I was overly optimistic.

So now, here are the updated tables.

Self Driving Cars

No predictions have yet been relevant for self driving cars, but I have augmented one comment from last year in this first table.  Also, see some comments right after this title.

Prediction
[Self Driving Cars]
Date2018 CommentsUpdates
A flying car can be purchased by any US resident if they have enough money.NET 2036There is a real possibility that this will not happen at all by 2050.
Flying cars reach 0.01% of US total cars.NET 2042That would be about 26,000 flying cars given today's total.
Flying cars reach 0.1% of US total cars.NIML
First dedicated lane where only cars in truly driverless mode are allowed on a public freeway.
NET 2021This is a bit like current day HOV lanes. My bet is the left most lane on 101 between SF and Silicon Valley (currently largely the domain of speeding Teslas in any case). People will have to have their hands on the wheel until the car is in the dedicated lane.
Such a dedicated lane where the cars communicate and drive with reduced spacing at higher speed than people are allowed to driveNET 2024
First driverless "taxi" service in a major US city, with dedicated pick up and drop off points, and restrictions on weather and time of day.NET 2022The pick up and drop off points will not be parking spots, but like bus stops they will be marked and restricted for that purpose only.20190101 Although a few such services have been announced every one of them operates with human safety drivers on board. And some operate on a fixed route and so do not count as a "taxi" service--they are shuttle buses. And those that are "taxi" services only let a very small number of carefully pre-approved people use them. We'll have more to argue about when any of these services do truly go driverless. That means no human driver in the vehicle, or even operating it remotely.

20200101
During 2019 Waymo started operating a 'taxi service' in Chandler, Arizona, with no human driver in the vehicles. While this is a big step forward see comments below for why this is not yet a driverless taxi service.
Such "taxi" services where the cars are also used with drivers at other times and with extended geography, in 10 major US citiesNET 2025A key predictor here is when the sensors get cheap enough that using the car with a driver and not using those sensors still makes economic sense.
Such "taxi" service as above in 50 of the 100 biggest US cities.NET 2028It will be a very slow start and roll out. The designated pick up and drop off points may be used by multiple vendors, with communication between them in order to schedule cars in and out.
Dedicated driverless package delivery vehicles in very restricted geographies of a major US city.NET 2023The geographies will have to be where the roads are wide enough for other drivers to get around stopped vehicles.
A (profitable) parking garage where certain brands of cars can be left and picked up at the entrance and they will go park themselves in a human free environment.NET 2023The economic incentive is much higher parking density, and it will require communication between the cars and the garage infrastructure.
A driverless "taxi" service in a major US city with arbitrary pick and drop off locations, even in a restricted geographical area.
NET 2032This is what Uber, Lyft, and conventional taxi services can do today.
Driverless taxi services operating on all streets in Cambridgeport, MA, and Greenwich Village, NY. NET 2035Unless parking and human drivers are banned from those areas before then.
A major city bans parking and cars with drivers from a non-trivial portion of a city so that driverless cars have free reign in that area.NET 2027
BY 2031
This will be the starting point for a turning of the tide towards driverless cars.
The majority of US cities have the majority of their downtown under such rules.NET 2045
Electric cars hit 30% of US car sales.NET 2027
Electric car sales in the US make up essentially 100% of the sales.NET 2038
Individually owned cars can go underground onto a pallet and be whisked underground to another location in a city at more than 100mph.NIMLThere might be some small demonstration projects, but they will be just that, not real, viable mass market services.
First time that a car equipped with some version of a solution for the trolley problem is involved in an accident where it is practically invoked.NIMLRecall that a variation of this was a key plot aspect in the movie "I, Robot", where a robot had rescued the Will Smith character after a car accident at the expense of letting a young girl die.

Chandler is a suburb of Phoenix and is itself the 84th largest city in the US. With apologies to residents of Chandler, I do not think that it comes to mind as a major US city for most Americans. Furthermore, the service has so far not been open to the public, but instead started with just a few hundred people (out of a population of about one quarter of a million residents) who had previously been approved to use the service when there was a human safety driver on board. These riders are banned from talking about when things go wrong so we really don’t know how well the systems works. Over 2019 the number of riders has grown to 1,500 monthly users, and a total of about 100,000 rides. Recently there has been an announcement that a phone app will make the service available to more users.

BUT, while there is no human driver in the taxi there is a remote human safety driver for all rides, as detailed in this story. While the humans can monitor more than one vehicle at a time, obviously there is a scaling issue, and the taxis are not truly autonomous. To make them so would be a big step. Also the taxis do not operate when it is raining. That would be the peak usage time for taxis in most cities. But they just don’t operate in the rain.

So… no self driving taxi service yet, even in a relatively small city with a population density many times less than that of major US cities.

The last twelve months have seen a real shakeout in expectations for deployment of self driving cars.  Companies are realizing that it is much harder than the came to believe for a while, and that there are many issues beyond simply “driving”, that need to be addressed.  I previously talked about a some of those issues in on this blog in January and June of 2017.

To illustrate how predictions have been slipping, here is a slide that I made for talks based on a snapshot of predictions about driverless cars from March 27, 2017. The web address still seems to give the same predictions with a couple more at the end that I couldn’t fit on my slide. In parentheses are the years the predictions were made, and in blue are the dates for when the innovation was predicted to happen.

Recently I had added some arrows to this slide. The skinny red arrows point to dates that have passed without the prediction coming to pass. The fatter orange arrows point to cases where company executives have since come out with updated predictions that are later than the ones given here. E.g., in the fourth line from the bottom, the Daimler chairman had said in 2014 that fully autonomous vehicles could be ready by 2025. In November of 2019 the chairman announced a reality check on self driving cars, as one can see in numerous online stories. Here is the first paragraph of one report on his remarks:

Mercedes-Benz parent Daimler has taken a “reality check” on self-driving cars. Making autonomous vehicles safe has proven harder than originally thought, and Daimler is now questioning their future earnings potential, CEO Ola Kaellenius told Reuters and other media.

Other reports of the same story can be found here and here.

None of the original predictions have come to pass, and those still standing are getting rather sparse.

<rant>

At the same time, however, there have been more outrageously optimistic predictions made about fully self driving cars being just around the corner. I won’t name names, but on April 23rd of 2019, i.e., less than nine months ago, Elon Musk said that in 2020 Tesla would have “one million robo-taxis” on the road, and that they would be “significantly cheaper for riders than what Uber and Lyft cost today”. While I have no real opinion on the veracity these predictions, they are what is technically called bullshit. Kai-Fu Lee and I had a little exchange on Twitter where we agreed that together we would eat all such Tesla robo-taxis on the road at the end of this year, 2020.

</rant>

Artificial Intelligence and Machine Learning

I had not predicted any big milestones for AI and machine learning for the current period, and indeed there were none achieved.

We have seen certain proponents be very proud of how much more compute they have, growing at many times what Moore’s Law at its best would provide. I think it is fair to say that the results of all that computing since 2012 are not very impressive when compared to what a single human brain, powered at just 20 Watts has been able to achieve in the same time frame — one just has to look at someone who’s 20th birthday is today, January 1st, 2020, and compare what they know now and what they can achieve now to what they could do in 2012.

And there has even been a little backlash about the carbon footprint that modern ML data sets cause in training. There are even tools and best practices for cutting down the carbon footprint of your ML research. People can argue about the details, but no one can make a case that the energy usage is not many orders of magnitude more than used by the meat machine inside people’s heads, and that human performance is way more impressive than any machine performance to date. People get fooled all the time by the slick marketing around each new achievement by the machine learning companies, but when you poke them you see that the achievements are rather pathetic compared to human performance.

Without any retraining make a Go playing program compete against a human on a 25 by 25 board, or even an 18 by 18 board. Or change all the colors of the pixels in a Quake Three Arena, or change the screen resolution, and humans will adapt seamlessly while the ML trained systems will have to start from zero again.

While ML conference attendance has gone up by a factor of 20 or so, the results are not so interestingly more powerful in terms of impact they have on the real world.

Right after the Artificial Intelligence and machine learning table I have some links to back up today’s assertion in it that there are more blog posts pushing back on DL as being all we will need to get to human level (whatever that might mean) Artificial Intelligence.

Prediction
[AI and ML]
Date2018 CommentsUpdates
Academic rumblings about the limits of Deep Learning
BY 2017
Oh, this is already happening... the pace will pick up.20190101 There were plenty of papers published on limits of Deep Learning. I've provided links to some right below this table. 20200101
Go back to last year's update to see them.
The technical press starts reporting about limits of Deep Learning, and limits of reinforcement learning of game play.
BY 2018
20190101 Likewise some technical press stories are linked below. 20200101
Go back to last year's update to see them.
The popular press starts having stories that the era of Deep Learning is over.BY 202020200101 We are seeing more and more opinion pieces by non-reporters saying this, but still not quite at the tipping point where reporters come at and say it. Axios and WIRED are getting close.
VCs figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".NET 2021I am being a little cynical here, and of course there will be no way to know when things change exactly.
Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.NET 2023
BY 2027
Whatever this turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.
The press, and researchers, generally mature beyond the so-called "Turing Test" and Asimov's three laws as valid measures of progress in AI and ML.NET 2022I wish, I really wish.
Dexterous robot hands generally available.NET 2030
BY 2040 (I hope!)
Despite some impressive lab demonstrations we have not actually seen any improvement in widely deployed robotic hands or end effectors in the last 40 years.
A robot that can navigate around just about any US home, with its steps, its clutter, its narrow pathways between furniture, etc.Lab demo: NET 2026
Expensive product: NET 2030
Affordable product: NET 2035
What is easy for humans is still very, very hard for robots.
A robot that can provide physical assistance to the elderly over multiple tasks (e.g., getting into and out of bed, washing, using the toilet, etc.) rather than just a point solution.NET 2028There may be point solution robots before that. But soon the houses of the elderly will be cluttered with too many robots.
A robot that can carry out the last 10 yards of delivery, getting from a vehicle into a house and putting the package inside the front door.Lab demo: NET 2025
Deployed systems: NET 2028
A conversational agent that both carries long term context, and does not easily fall into recognizable and repeated patterns.Lab demo: NET 2023
Deployed systems: 2025
Deployment platforms already exist (e.g., Google Home and Amazon Echo) so it will be a fast track from lab demo to wide spread deployment.
An AI system with an ongoing existence (no day is the repeat of another day as it currently is for all AI systems) at the level of a mouse.NET 2030I will need a whole new blog post to explain this...
A robot that seems as intelligent, as attentive, and as faithful, as a dog.NET 2048This is so much harder than most people imagine it to be--many think we are already there; I say we are not at all there.
A robot that has any real idea about its own existence, or the existence of humans in the way that a six year old understands humans.NIML

There are outlets now for non-journalists, perhaps practitioners in a scientific field, to write position papers that get widely referenced in social media. These position papers are often forerunners of what the popular press will soon start reporting.

During 2019 we saw many, many well informed such position papers/blogposts. We have seen explanations on how machine learning  has limitations on when it makes sense to be used and that it may not be a universal silver bullet.  There have been posts that deep learning may be hitting limits as it has no common sense. We have seen questions about the practical value of the results of deep learning on game playing as game playing is precisely where we have massive amounts of completely relevant data–problems in the real world more commonly have very little data and reasoning from other domains is imperative to figuring out how to make progress on the problem. And we have seen warnings that all the over-hype of machine and deep learning may lead to a new AI winter when those tens of thousands of jolly conference attendees will no longer have grants and contracts to pay for travel to and attendance at their fiestas.

I am very concerned about what will happen when the current machine/deep learning bubble bursts. We have seen the bursting of hype bubbles decimate AI research before. The self driving cars bubble and its bubble bursting having a potential negative impact in AI research also worries me.

Space

There were no target dates that have been hit or missed in the last year in the space launch domain, but I have made a couple of update comments in the following table, and then follow it with details in the text below.

Prediction
[Space]
Date2018 CommentsUpdates
Next launch of people (test pilots/engineers) on a sub-orbital flight by a private company.
BY 2018
20190101 Virgin Galactic did this on December 13, 2018.

20200101 On February 22, 2019, Virgin Galactic had their second flight, this time with three humans on board, to space of their current vehicle. As far as I can tell that is the only sub-orbital flight of humans in 2019. Blue Origin's new Shepard flew three times in 2019, but with no people aboard as in all its flights so far.
A few handfuls of customers, paying for those flights.NET 2020
A regular sub weekly cadence of such flights.NET 2022
BY 2026
Regular paying customer orbital flights.NET 2027Russia offered paid flights to the ISS, but there were only 8 such flights (7 different tourists). They are now suspended indefinitely.
Next launch of people into orbit on a US booster.
NET 2019
BY 2021
BY 2022 (2 different companies)
Current schedule says 2018.20190101 It didn't happen in 2018. Now both SpaceX and Boeing say they will do it in 2019.

20200101 Both Boeing and SpaceX had major failures with their systems during 2019, though no humans were aboard in either case. So this goal was not achieved in 2019. Both companies are optimistic of getting it done in 2020, as they were for 2019. I'm sure it will happen eventually for both companies.
Two paying customers go on a loop around the Moon, launch on Falcon Heavy.
NET 2020
The most recent prediction has been 4th quarter 2018. That is not going to happen.20190101 I'm calling this one now as SpaceX has revised their plans from a Falcon Heavy to their still developing BFR (or whatever it gets called), and predict 2023. I.e., it has slipped 5 years in the last year.
Land cargo on Mars for humans to use at a later date
NET 2026SpaceX has said by 2022. I think 2026 is optimistic but it might be pushed to happen as a statement that it can be done, rather than for an pressing practical reason.
Humans on Mars make use of cargo previously landed there.NET 2032Sorry, it is just going to take longer than every one expects.
First "permanent" human colony on Mars.NET 2036It will be magical for the human race if this happens by then. It will truly inspire us all.
Point to point transport on Earth in an hour or so (using a BF rocket).NIMLThis will not happen without some major new breakthrough of which we currently have no inkling.
Regular service of Hyperloop between two cities.NIMLI can't help but be reminded of when Chuck Yeager described the Mercury program as "Spam in a can".

During a ground test of the SpaceX Crewed Dragon capsule, on April 20th, 2019, it exploded catastrophically. This delayed the SpaceX program so that no manned test could be done in 2019. SpaceX traced the problem to a valve failure when starting up the capsule abort engines, needed during launch if the booster rocket is undergoing failure. They currently have a test scheduled for early 2020 where these engines will be ignited during a launch so that the capsule can safely fly away from the launch vehicle.

In December of 2019 Boeing had a major test of its CST-100 Starliner capsule, and ended up with both a failure and a success for the mission. It was supposed to be the final unmanned test of the vehicle, and was planned to dock with the International Space Station (ISS) and then do a soft landing on the ground. It launched on December 20th and achieved orbit, but due to software failures it was the wrong orbit and there was not enough fuel left to get it to the ISS. This was a major failure. On the other hand it achieved a major success in doing a soft landing in New Mexico on December 22nd.

Other Hype Magnets

I have not felt qualified to talk about the hype impact for both quantum computing and block chain. Just at the end of 2019 there was a very interesting blog post by Scott Aaronson, a true expert and theoretical contributor to the field of quantum computing, on how to read announcements about quantum computing results. I recommend it.

Guest Post by Phillip Alvelda: Pondering the Empathy Gap

rodneybrooks.com/guest-post-by-phillip-alveda-pondering-the-empathy-gap/

[Phillip Alvelda is an old friend from MIT, and CEO of Brainworks.]

Pondering how to close what seems to be a rapidly widening empathy gap here in the U.S. and globally.

I used to just be resigned to the fact that many of my white friends who had never felt, or experienced discrimination directed at themselves seem incapable of seeing or recognizing implicit, or even explicit, bias directed at others. I didn’t used to think of these people as mean or racist…just oblivious through lack of direct experience.

But now, with a nation inflamed by our own government inciting and validating hatred and bigotry, with brown asylum seekers and children dying in mass US internment camps, and LGBTQ and women’s’ rights under mounting assault, the discrimination has literally turned lethal. And the empathy gap is enabling these crimes against humanity to continue and grow in the US now, just like the silent majority in Weimar Germany allowed the Jewish genocide to advance.

I’ve come to see supporters of this corrupt and criminal administration as increasingly complicit in the ongoing crimes. It is no longer just a matter of not seeing discrimination that doesn’t impact your family directly.

Trump supporters and anyone who supports any of his Republican enablers must now find some way to look past the growing reports of discrimination, minority voter suppression and gerrymandering, hate crimes, repression, the roll back of women’s and LGBTQ rights, a measurable biased justice system, mass internment camps, and now even the murder of the weak and vulnerable kidnapped children that commit no crime other than to follow our own ancestors to seek freedom and opportunity in the US….. This growing mass of willfully blind conservatives have abandoned fair morality, and are direct enablers of evil.

We are now in an era I never thought to see in the US, when government manufactured propaganda is purposely driving the dehumanization of women, LGBTQ people, and people of color. The US empathy gap is widening rapidly. How can we fight these dark divisive forces and narrow the gap, when our polarized society can’t even agree on measurable objective realities like the climate crisis?

Otherwise, I fear the U.S. is on a path to dissolve into at least two countries, divided along a border between those states who value empathy and seek an inclusive and pluralistic future society, and those who seek to retreat to tribal protectionism of historical rights for a shrinking privileged majority.

That this struggle rises now really baffles me. Consider the world’s obviously increasing wealth and abundance, with declining poverty and starvation and increasing access to virtually unlimited renewable energy. The need for tribal dominance to horde resources is dissapearing. The need for borders to protect resources that are no longer scarce, is vanishing.

Just imagine if all of our military and arms spending, all of the money we spend enforcing borders and limiting access to food and medicine and energy and education were instead directed towards sharing this abundance!

Pluralism and empathy are clearly the answer. How can we get more people to realize this despite the onslaught of vitriol and tribal Incitement from the likes of Fox News?

AGI Has Been Delayed

rodneybrooks.com/agi-has-been-delayed/

very recent article follows in the footsteps of many others talking about how the promise of autonomous cars on roads is a little further off than many pundits have been predicting for the last few years. Readers of this blog will know that I have been saying this for over two years now. Such skepticism is now becoming the common wisdom.

In this new article at The Ringer, from May 16th, the author Victor Luckerson, reports:

Elon Musk, the driverless car is always right around the corner. At an investor day event last month focused on Tesla’s autonomous driving technology, the CEO predicted that his company would have a million cars on the road next year with self-driving hardware “at a reliability level that we would consider that no one needs to pay attention.” That means Level 5 autonomy, per the Society of Automotive Engineers, or a vehicle that can travel on any road at any time without human intervention. It’s a level of technological advancement I once compared to the Batmobile.

Musk has made these kinds of claims before. In 2015 he predicted that Teslas would have “complete autonomy” by 2017 and a regulatory green light a year later. In 2016 he said that a Tesla would be able to drive itself from Los Angeles to New York by 2017, a feat that still hasn’t happened. In 2017 he said people would be able to safely sleep in their fully autonomous Teslas in about two years. The future is now, but napping in the driver’s seat of a moving vehicle remains extremely dangerous.

When I saw someone tweeting that Musk’s comments meant that a million autonomous taxis would be on the road by 2020, I tweeted out the following:

Let’s count how many truly autonomous (no human safety driver) Tesla taxis (public chooses destination & pays) on regular streets (unrestricted human driven cars on the same streets) on December 31, 2020. It will not be a million. My prediction: zero. Count & retweet this then.

I think these three criteria need to be met before someone can say that we have autonomous taxis on the road.

The first challenge, no human safety driver, has not been met by a single experimental deployment of autonomous vehicles on public roads anywhere in the world. They all have safety humans in the vehicle. A few weeks ago I saw an autonomous shuttle trial along the paved beachside public walkways at the beach on which I grew up, in Glenelg, South Australia, where there were two “two onboard stewards to ensure everything runs smoothly” along with eight passengers. Today’s demonstrations are just not autonomous. In fact in the article above Luckerson points out that Uber’s target is to have their safety drivers intervene only once every 13 miles, but they are way off that capability at this time. Again, hardly autonomous, even if they were to meet that goal. Imagine having a breakdown of your car that you are driving once every 13 miles–we expect better.

And if normal human beings can’t simply use these services (in Waymo’s Phoenix trial only 400 pre-approved people are allowed to try them out) and go anywhere that they can go in a current day taxi, then really the things deployed will not be autonomous taxis. They will be something else. Calling them taxis would be redefining what a taxi is. And if you can just redefine words on a whim there is really not much value to your words.

I am clearly skeptical about seeing autonomous cars on our roads in the next few years. In the long term I am enthusiastic. But I think it is going to take longer than most people think.

In response to my tweet above, Kai-Fu Lee, a very strong enthusiast about the potential for AI, and a large investor in Chinese AI companies, replied with:

If there are a million Tesla robo-taxis functioning on the road in 2020, I will eat them. Perhaps @rodneyabrooks will eat half with me?

I readily replied that I would be happy to share the feast!

Luckerson talks about how executives, in general, are backing off from their previous predictions about how close we might be to having truly autonomous vehicles on our roads.  Most interestingly he quotes Chris Urmson:

Chris Urmson, the former leader of Google’s self-driving car project, once hoped that his son wouldn’t need a driver’s license because driverless cars would be so plentiful by 2020. Now the CEO of the self-driving startup Aurora, Urmson says that driverless cars will be slowly integrated onto our roads “over the next 30 to 50 years.”

Now let’s take note of this. Chris Urmson was the leader of Google’s self-driving car project, which became Waymo around the time he left, and is the CEO of a very well funded self-driving start up. He says “30 to 50 years”. Chris Urmson has been a leader in the autonomous car world since before it entered mainstream consciousness. He has lived and breathed autonomous vehicles for over ten years. No grumpy old professor is he. He is a doer and a striver. If he says it is hard then we know that it is hard.

I happen to agree, but I want to use this reality check for another thread.

If we were to have AGI, Artificial General Intelligence, with human level capabilities, then certainly it ought to be able to drive a car, just like a person, if not better. Now a self driving car does not need to have general human level intelligence, but a self driving car is certainly a lower bound on human level intelligence.  Urmson, a strong proponent of self driving cars says 30 to 50 years.

So what does that say about predictions that AGI is just around the corner? And what does it say about it being an existential threat to humanity any time soon. We have plenty of existential threats to humanity lining up to bash us in the short term, including climate change, plastics in the oceans, and a demographic inversion. If AGI is a long way off then we can not say anything sensible today about what promises or threats it might provide as we need to completely re-engineer our world long before it shows up, and when it does show up it will be in a world that we can not yet predict.

Do people really say that AGI is just around the corner? Yes, they do…

Here is a press report on a conference on “Human Level AI” that was held in 2018. It reports that 37\% of respondents to a survey at that conference said they expected human level AI to be around in 5 to 10 years. Now, I must say that looking through the conference site I see more large hats than cattle, but these are mostly people with paying corporate or academic jobs, and 37\% of them think this.

Ray Kurzweil still maintains, in Martin Ford’s recent book that we will see a human level intelligence by 2029–in the past he has claimed that we will have a singularity by then as the intelligent machines will be so superior to human level intelligence that they will exponentially improve themselves (see my comments on belief in magic as one of the seven deadly sins in predicting the future of AI). Mercifully the average prediction of the 18 respondents for this particular survey was that AGI would show up around 2099.  I may have skewed that average a little as I was an outlier amongst the 18 people at the year 2200. In retrospect I wish I had said 2300 and that is the year I have been using in my recent talks.

And a survey taken by the Future of Life Institute (warning: that institute has a very dour view of the future of human life, worse than my concerns of a few paragraphs ago) says were are going to get AGI around 2050.

But that is the low end of when Urmson thinks we will have autonomous cars deployed. Suppose he is right about his range. And suppose I am right that  autonomous driving is a lower bound on AGI, and I believe it is a very low bound. With these very defensible assumptions then the seemingly sober experts in Martin Ford’s new book are on average wildly optimistic about when AGI is going to show up.

AGI has been delayed.

 

A Better Lesson

rodneybrooks.com/a-better-lesson/

Just last week Rich Sutton published a very short blog post titled  The Bitter Lesson. I’m going to try to keep this review shorter than his post. Sutton is well known for his long and sustained contributions to reinforcement learning.

In his post he argues, using many good examples, that over the 70 year history of AI, more computation and less built in knowledge has always won out as the best way to build Artificial Intelligence systems. This resonates with a current mode of thinking among many of the newer entrants to AI that it is better to design learning networks and put in massive amounts of computer power, than to try to design a structure for computation that is specialized in any way for the task. I must say, however, that at a two day work shop on Deep Learning last week at the National Academy of Science, the latter idea was much more in vogue, something of a backlash against exactly what Sutton is arguing.

I think Sutton is wrong for a number of reasons.

  1. One of the most celebrated successes of Deep Learning is image labeling, using CNNs, Convolutional Neural Networks, but the very essence of CNNs is that the front end of the network is designed by humans to manage translational invariance, the idea that objects can appear anywhere in the frame. To have a Deep Learning network also have to learn that seems pedantic to the extreme, and will drive up the computational costs of the learning by many orders of magnitude.
  2. There are other things in image labeling that suffer mightily because the current crop of CNNs do not have certain things built in that we know are important for human performance. E.g., color constancy. This is why the celebrated example of a traffic stop sign with some pieces of tape on it is seen as a 45 mph speed limit sign by a certain CNN trained for autonomous driving. No human makes that error because they know that stop signs are red, and speed limit signs are white. The CNN doesn’t know that, because the relationship between pixel color in the camera and the actual color of the object is a very complex relationship that does not get elucidated with the measly tens of millions of training images that the algorithms are trained on. Saying that in the future we will have viable training sets is shifting the human workload to creating massive training sets and encoding what we want the system to learn in the labels. This is just as much building knowledge in as it would be to directly build a color constancy stage. It is sleight of hand in moving the human intellectual work to somewhere else.
  3. In fact, for most machine learning problems today a human is needed to design a specific network architecture for the learning to proceed well. So again, rather than have the human build in specific knowledge we now expect the human to build the particular and appropriate network, and the particular training regime that will be used. Once again it is sleight of hand to say that AI succeeds without humans getting into the loop. Rather we are asking the humans to pour their intelligence into the algorithms in a different place and form.
  4. Massive data sets are not at all what humans need to learn things so something is missing. Today’s data sets can have billions of examples, where a human may only require a handful to learn the same thing. But worse, the amount of computation needed to train many of the networks we see today can only be furnished by very large companies with very large budgets, and so this push to make everything learnable is pushing the cost of AI outside that of individuals or even large university departments. That is not a sustainable model for getting further in intelligent systems. For some machine learning problems we are starting to see a significant carbon foot print due to the power consumed during the learning phase.
  5. Moore’s Law is slowing down, so that some computer architects are reporting the doubling time in amount of computation on a single chip is moving from one year to twenty years. Furthermore the breakdown of Dennard scaling back in 2006 means that the power consumption of machines goes up as they perform better, and so we can not afford to put even the results of machine learning (let alone the actual learning) on many of our small robots–self driving cars require about 2,500 Watts of power for computation–a human brain only requires 20 Watts. So Sutton’s argument just makes this worse, and makes the use of AI and ML impractical.
  6. Computer architects are now trying to compensate for these problems by building special purpose chips for runtime use of trained networks. But they need to lock in the hardware to a particular network structure and capitalize on human analysis of what tricks can be played without changing the results of the computation, but with greatly reduced power budgets. This has two drawbacks. First it locks down hardware specific to particular solutions, so every time we have a new ML problem we will need to design new hardware. And second, it once again is simply shifting where human intelligence needs to be applied to make ML practical, not eliminating the need for humans to be involved in the design at all.

So my take on Rich Sutton’s piece is that the lesson we should learn from the last seventy years of AI research is not at all that we should just use more computation and that always wins. Rather I think a better lesson to be learned is that we have to take into account the total cost of any solution, and that so far they have all required substantial amounts of human ingenuity. Saying that a particular solution style minimizes a particular sort of human ingenuity that is needed while not taking into account all the other places that it forces human ingenuity (and carbon footprint) to be expended is a terribly myopic view of the world.

This review, including this comment, is seventy six words shorter than Sutton’s post.

Predictions Scorecard, 2019 January 01

rodneybrooks.com/predictions-scorecard-2019-january-01/

On January 1st, 2018, I made predictions (here) about self driving cars, Artificial Intelligence and machine learning, and about progress in the space industry. Those predictions had dates attached to them for 32 years up through January 1st, 2050.

So, today, January 1st, 2019, is my first annual self appraisal of how well I did. I’ll try to do this every year for 32 years, if I last that long.

I am not going to edit my original post, linked above, at all, even though I see there are a few typos still lurking in it. Instead I have copied the three tables of predictions below. I have changed the header of the third column in each case to “2018 Comments”, but left the comments exactly as they were, and added a fourth column titled “Updates”. In one case I fixed a typo (about self driving taxis in Cambridgeport and Greenwich Village) in the left most column. I have started highlighting the dates in column two where the time they refer to has arrived, and I am starting to put comments in the updates fourth column.

I will tag each comment in the fourth column with a cyan colored date tag in the form yyyymmdd such as 20190603 for June 3rd, 2019.

The entries that I put in the second column of each table, titled “Date” in each case, back on January 1st of 2018, have the following forms:

NIML meaning “Not In My Lifetime, i.e., not until beyond December 31st, 2049, the last day of the first half of the 21st century.

NET some date, meaning “No Earlier Than” that date.

BY some date, meaning “By” that date.

Sometimes I gave both a NET and a BY for a single prediction, establishing a window in which I believe it will happen.

For now I am coloring those statements when it can be determined already whether I was correct or not.

I have started using LawnGreen (#7cfc00) for those predictions which were entirely accurate. For instance a BY 2018 can be colored green if the predicted thing did happen in 2018, as can a NET 2019 if it did not happen in 2018 or earlier. There are five predictions now colored green.

I will color dates Tomato (#ff6347) if I was too pessimistic about them. No Tomato dates yet. But if something happens that I said NIML, for instance then it would go Tomato, or if in 2019 something already had happened that I said NET 2020, then that too would go Tomato.

If I was too optimistic about something, e.g., if I had said BY 2018, and it hadn’t yet happened, then I would color it DeepSkyBlue (#00bfff). None of these yet either. And eventually if there are NETs that went green, but years later have still not come to pass I may start coloring them LightSkyBlue (#87cefa).

In summary then: Green splashes mean I got things exactly right. Red means provably wrong and that I was too pessimistic. And blueness will mean that I was overly optimistic.

So now, here are the updated tables. So far none of my predictions have been at all wrong–there is only one direction to go from here!

No predictions have yet been relevant for self driving cars, but I have added one comment in this first table.

Prediction
[Self Driving Cars]
Date2018 CommentsUpdates
A flying car can be purchased by any US resident if they have enough money.NET 2036There is a real possibility that this will not happen at all by 2050.
Flying cars reach 0.01% of US total cars.NET 2042That would be about 26,000 flying cars given today's total.
Flying cars reach 0.1% of US total cars.NIML
First dedicated lane where only cars in truly driverless mode are allowed on a public freeway.
NET 2021This is a bit like current day HOV lanes. My bet is the left most lane on 101 between SF and Silicon Valley (currently largely the domain of speeding Teslas in any case). People will have to have their hands on the wheel until the car is in the dedicated lane.
Such a dedicated lane where the cars communicate and drive with reduced spacing at higher speed than people are allowed to driveNET 2024
First driverless "taxi" service in a major US city, with dedicated pick up and drop off points, and restrictions on weather and time of day.NET 2022The pick up and drop off points will not be parking spots, but like bus stops they will be marked and restricted for that purpose only.20190101 Although a few such services have been announced every one of them operates with human safety drivers on board. And some operate on a fixed route and so do not count as a "taxi" service--they are shuttle buses. And those that are "taxi" services only let a very small number of carefully pre-approved people use them. We'll have more to argue about when any of these services do truly go driverless. That means no human driver in the vehicle, or even operating it remotely.
Such "taxi" services where the cars are also used with drivers at other times and with extended geography, in 10 major US citiesNET 2025A key predictor here is when the sensors get cheap enough that using the car with a driver and not using those sensors still makes economic sense.
Such "taxi" service as above in 50 of the 100 biggest US cities.NET 2028It will be a very slow start and roll out. The designated pick up and drop off points may be used by multiple vendors, with communication between them in order to schedule cars in and out.
Dedicated driverless package delivery vehicles in very restricted geographies of a major US city.NET 2023The geographies will have to be where the roads are wide enough for other drivers to get around stopped vehicles.
A (profitable) parking garage where certain brands of cars can be left and picked up at the entrance and they will go park themselves in a human free environment.NET 2023The economic incentive is much higher parking density, and it will require communication between the cars and the garage infrastructure.
A driverless "taxi" service in a major US city with arbitrary pick and drop off locations, even in a restricted geographical area.
NET 2032This is what Uber, Lyft, and conventional taxi services can do today.
Driverless taxi services operating on all streets in Cambridgeport, MA, and Greenwich Village, NY. NET 2035Unless parking and human drivers are banned from those areas before then.
A major city bans parking and cars with drivers from a non-trivial portion of a city so that driverless cars have free reign in that area.NET 2027
BY 2031
This will be the starting point for a turning of the tide towards driverless cars.
The majority of US cities have the majority of their downtown under such rules.NET 2045
Electric cars hit 30% of US car sales.NET 2027
Electric car sales in the US make up essentially 100% of the sales.NET 2038
Individually owned cars can go underground onto a pallet and be whisked underground to another location in a city at more than 100mph.NIMLThere might be some small demonstration projects, but they will be just that, not real, viable mass market services.
First time that a car equipped with some version of a solution for the trolley problem is involved in an accident where it is practically invoked.NIMLRecall that a variation of this was a key plot aspect in the movie "I, Robot", where a robot had rescued the Will Smith character after a car accident at the expense of letting a young girl die.

Right after the Artificial Intelligence and machine learning table I have some links to back up my assertions.

Prediction
[AI and ML]
Date2018 CommentsUpdates
Academic rumblings about the limits of Deep Learning
BY 2017
Oh, this is already happening... the pace will pick up.20190101 There were plenty of papers published on limits of Deep Learning. I've provided links to some right below this table.
The technical press starts reporting about limits of Deep Learning, and limits of reinforcement learning of game play.
BY 2018
20190101 Likewise some technical press stories are linked below.
The popular press starts having stories that the era of Deep Learning is over.BY 2020
VCs figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".NET 2021I am being a little cynical here, and of course there will be no way to know when things change exactly.
Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.NET 2023
BY 2027
Whatever this turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.
The press, and researchers, generally mature beyond the so-called "Turing Test" and Asimov's three laws as valid measures of progress in AI and ML.NET 2022I wish, I really wish.
Dexterous robot hands generally available.NET 2030
BY 2040 (I hope!)
Despite some impressive lab demonstrations we have not actually seen any improvement in widely deployed robotic hands or end effectors in the last 40 years.
A robot that can navigate around just about any US home, with its steps, its clutter, its narrow pathways between furniture, etc.Lab demo: NET 2026
Expensive product: NET 2030
Affordable product: NET 2035
What is easy for humans is still very, very hard for robots.
A robot that can provide physical assistance to the elderly over multiple tasks (e.g., getting into and out of bed, washing, using the toilet, etc.) rather than just a point solution.NET 2028There may be point solution robots before that. But soon the houses of the elderly will be cluttered with too many robots.
A robot that can carry out the last 10 yards of delivery, getting from a vehicle into a house and putting the package inside the front door.Lab demo: NET 2025
Deployed systems: NET 2028
A conversational agent that both carries long term context, and does not easily fall into recognizable and repeated patterns.Lab demo: NET 2023
Deployed systems: 2025
Deployment platforms already exist (e.g., Google Home and Amazon Echo) so it will be a fast track from lab demo to wide spread deployment.
An AI system with an ongoing existence (no day is the repeat of another day as it currently is for all AI systems) at the level of a mouse.NET 2030I will need a whole new blog post to explain this...
A robot that seems as intelligent, as attentive, and as faithful, as a dog.NET 2048This is so much harder than most people imagine it to be--many think we are already there; I say we are not at all there.
A robot that has any real idea about its own existence, or the existence of humans in the way that a six year old understands humans.NIML

With regards to academic rumblings about deep learning, in 2017 there was a new cottage industry in attacking deep learning by constructing fake images for which a deep learning network gave high scores for ridiculous interpretations. These are known as adversarial attacks on deep learning, and some defenders counter claim that such images will never arrive in practice.

But then in 2018 others found images that were completely natural that fooled particular deep learning networks. A group of researchers from Auburn University in Alabama show how an otherwise well trained network can just completely misclassify objects with unusual orientations, in ways which no human would get wrong at all. Here are some examples:

We humans can see why or how a network might get the first one wrong for instance. It is a large yellow object across a snowy road. But other clues, like the size of the person standing in front of it immediately get us to understand that it is a school bus on its side across the road, and we are looking at its roof.

And here is a paper from researchers at York University and the University of Toronto (both in Toronto) with this abstract:

We showcase a family of common failures of state-of-the art object detectors. These are obtained by replacing image sub-regions by another sub-image that contains a trained object. We call this “object transplanting”. Modifying an image in this manner is shown to have a non-local impact on object detection. Slight changes in object position can affect its identity according to an object detector as well as that of other objects in the image. We provide some analysis and suggest possible reasons for the reported phenomena.

In all their images a human can easily see that an object (e.g., an elephant, say, and hence the very clever title of the paper, “The Elephant in the Room”) has been pasted on to a real scene, and both understand the real scene and identify the object pasted on. The deep learning network can often do neither.

Other academics took to more popular press outlets to express their concerns that the press was overhyping deep learning, and showing what the limits are in reality. There was a piece by Michael Jordan of UC Berkeley in Medium, an op-ed in the New York Times by Gary Marcus and Ernest Davis of NYU and a story on the limits of Google Translate in the Atlantic by Douglas Hofstadter of Indiana University at Bloomington.

As for stories in the technical press there were many that sounded warning alarms about how deep learning was not necessarily going to the greatest most important technical breakthrough in the history of mankind. I must admit, however, that more than 99% of the popular press stories did lean towards that far fetched conclusion, especially in the headlines.

Here is PC Magazine talking about the limits in language understanding, Forbes magazine on the overhyping of deep learning. A national security newsletter quotes a Nobel prizewinner on AI:

Intuition, insight, and learning are no longer exclusive possessions of human beings: any large high-speed computer can be programed to exhibit them also.

This was said by Herb Simon in 1958. The newsletter goes on to warn that over hype is nothing new in AI and that it could well lead to another AI winter. Harvard Magazine reports on the dangers applying a an inadequate AI system to decision making about humans. And many many outlets reported on an experimental Amazon recruiting tool that learned biases against women candidates from looking at how humans had evaluated CVs.

The press is not yet fully woke with regard to AI, and deep learning in particular, but there are signs and examples of wokeness showing up all over.


Developments in space were the most active for this first year, and fortunately both my optimism and pessimism were well place and were each rewarded.

Prediction
[Space]
Date2018 CommentsUpdates
Next launch of people (test pilots/engineers) on a sub-orbital flight by a private company.
BY 2018
20190101 Virgin Galactic did this on December 13, 2018.
A few handfuls of customers, paying for those flights.NET 2020
A regular sub weekly cadence of such flights.NET 2022
BY 2026
Regular paying customer orbital flights.NET 2027Russia offered paid flights to the ISS, but there were only 8 such flights (7 different tourists). They are now suspended indefinitely.
Next launch of people into orbit on a US booster.
NET 2019
BY 2021
BY 2022 (2 different companies)
Current schedule says 2018.20190101 It didn't happen in 2018. Now both SpaceX and Boeing say they will do it in 2019.
Two paying customers go on a loop around the Moon, launch on Falcon Heavy.
NET 2020
The most recent prediction has been 4th quarter 2018. That is not going to happen.20190101 I'm calling this one now as SpaceX has revised their plans from a Falcon Heavy to their still developing BFR (or whatever it gets called), and predict 2023. I.e., it has slipped 5 years in the last year.
Land cargo on Mars for humans to use at a later date
NET 2026SpaceX has said by 2022. I think 2026 is optimistic but it might be pushed to happen as a statement that it can be done, rather than for an pressing practical reason.
Humans on Mars make use of cargo previously landed there.NET 2032Sorry, it is just going to take longer than every one expects.
First "permanent" human colony on Mars.NET 2036It will be magical for the human race if this happens by then. It will truly inspire us all.
Point to point transport on Earth in an hour or so (using a BF rocket).NIMLThis will not happen without some major new breakthrough of which we currently have no inkling.
Regular service of Hyperloop between two cities.NIMLI can't help but be reminded of when Chuck Yeager described the Mercury program as "Spam in a can".

 

[FoR&AI] Steps Toward Super Intelligence IV, Things to Work on Now

rodneybrooks.com/forai-steps-toward-super-intelligence-iv-things-to-work-on-now/

[This is the fourth part of a four part essay–here is Part I.]

We have been talking about building an Artificial General Intelligence agent, or even a Super Intelligence agent. How are we going to get there?  How are we going get to ECW and SLP? What do researchers need to work on now?

In a little bit I’m going to introduce four pseudo goals, based on the capabilities and competences of children. That will be my fourth big list of things in these four parts of this essay.  Just to summarize so the numbers and lists don’t get too confusing here is what I have described and proposed over these four sub essays:

Part I
4 Previous approaches to AI
Part II2 New Turing Test replacements
Part III
7 (of many) things that are currently hard for AI
Part IV4 Ways to make immediate progress

But what should AI researchers actually work on now?

I think we need to work on architectures of intelligent beings, whether they live in the real world or in cyber space. And I think that we need to work on structured modules that will give the base compositional capabilities, ground everything in perception and action in the world, have useful spatial representations and manipulations, provide enough ability to react to the world on short time scales, and to adequately handle ambiguity across all these domains.

First let’s talk about architectures for intelligent beings.

Currently all AI systems operate within some sort of structure, but it is not the structure of something with ongoing existence. They operate as transactional programs that people run when they want something.

Consider AlphaGo, the program that beat 18 time world Go champion, Lee Sedol, in March of 2016. The program had no idea that it was playing a game, that people exist, or that there is two dimensional territory in the real world–it didn’t know that a real world exists. So AlphaGo was very different from Lee Sedol who is a living, breathing human who takes care of his existence in the world.

I remember seeing someone comment at the time that Lee Sedol was supported by a cup of coffee. And Alpha Go was supported by 200 human engineers. They got it processors in the cloud on which to run, managed software versions, fed AlphaGo the moves (Lee Sedol merely looked at the board with his own two eyes), played AlphaGo’s desired moves on the board, rebooted everything when necessary, and generally enabled AlphaGo to play at all. That is not a Super Intelligence, it is a super basket case.

So the very first thing we need is programs, whether they are embodied or not, that can take care of their own needs, understand the world in which they live (be it the cloud or the physical world) and ensure their ongoing existence. A Roomba does a little of this, finding its recharger when it is low on power, indicating to humans that it needs its dust bin emptied, and asking for help when it gets stuck. That is hardly the level of self sufficiency we need for ECW, but it is an indication of the sort of thing I mean.

Now about the structured modules that were the subject of my second point.

The seven examples I gave, in Part III, of things which are currently hard for Artificial Intelligence, are all good starting points. But they were just seven that I chose for illustrative purposes. There are a number of people who have been thinking about the issue, and they have come up with their own considered lists.

Some might argue, based on the great success of letting Deep Learning learn not only spoken words themselves but the feature detectors for early processing of phonemes that we are better off letting learning figure everything out. My point about color constancy is that it is not something that naturally arises from simply looking at online images. It comes about in the real world from natural evolution building mechanisms to compensate for the fact that objects don’t actually change their inherent color when the light impinging on them changes. That capability is an innate characteristic of evolved organisms whenever it matters to them. We are most likely to get there quicker if we build some of the important modules ahead of time.

And for the hard core learning festishists here is a question to ask them. Would they prefer that their payroll department, their mortgage provider, or the Internal Revenue Service (the US income tax authority) use an Excel spreadsheet to calculate financial matters for them, or would they trust these parts of their lives to a trained Deep Learning network that had seen millions of examples of spreadsheets and encoded all that learning in weights in a network? You know what they are going to answer. When it  comes to such a crunch even they will admit that learning from examples is not necessarily the best approach.

Gary Marcus, who I quoted along with Ernest Davis about common sense in Part III, has talked about his list of modules1 that are most important to build in. They are:

  • Representations of objects
  • Structured, algebraic representations
  • Operations over variables
  • A type-token distinction
  • A capacity to represent sets, locations, paths, trajectories, obstacles and enduring individuals
  • A way of representing the affordances of objects
  • Spatiotemporal contiguity
  • Causality
  • Translational invariance
  • Capacity for cost-benefit analysis

Others will have different explicit lists, but as long as people are working on innate modules that can be combined within a structure of some entity with an ongoing existence and its own ongoing projects, that can be combined within a system that perceives and acts on its world, and that can be combined within a system that is doing something real rather than a toy online demonstration, then progress will be being made.

And note, we have totally managed to avoid the question of consciousness. Whether either ECW or SLP need to conscious in any way at all, is, I think, an open question. And it will remain so as long as we have no understanding at all of consciousness. And we have none!

HOW WILL WE KNOW IF WE ARE GETTING THERE?

Alan Turing introduced The Imitation Game, in his 1950 paper Computing Machinery and Intelligence. His intent was, as he said in the very first sentence of the paper, to consider the question “Can Machines Think?”. He used the game as a rhetorical device to discuss objections to whether or not a machine could be capable of “thinking”. And while he did make a prediction of when a machine would be able to play the game (a 70% change of fooling a human that the machine was a human in the year 2000), I don’t think that he meant the game as a benchmark for machine intelligence.

But the press, over the years, rather than real Artificial Intelligence researchers, picked up on this game and it became known as the Turing Test. For some, whether or not a machine could beat a human at this parlor game, became the acid test of progress in Artificial Intelligence. It was never a particularly good test, and so the big “tournaments” organized around it were largely ignored by serious researchers, and eventually pretty dumb chat bots that were not at all intelligent started to get crowned as the winners.

Meanwhile real researchers were competing in DARPA competitions such as the Grand Challenge, Urban Grand Challenge (which lead directly to all the current work on self driving cars), and the Robot Challenge.

We could imagine tests or competitions being set up for how well an embodied and a disembodied Artificial Intelligence system perform at the ECW and SLP tasks. But I fear that like the Turing Test itself these new tests would get bastardized and gamed. I am content to see the market choose the best versions of ECW and SLP–unlike a pure chatterer that can game the Turing Test, I think such systems can have real economic value. So no tests or competitions for ECWs and SLPs.

I have never been a great fan of competitions for research domains as I have always felt that it leads to group think, and a lot of effort going into gaming the rules. And, I think that specific stated goals can lead to competitions being formed, even when none may have been intended, as in the case of the Turing Test.

Instead I am going to give four specific goals here. Each of them is couched in terms of the competence of capabilities of human children of certain ages.

  • The object recognition capabilities of a two year old.
  • The language understanding capabilities of a four year old.
  • The manual dexterity of a six year old.
  • The social understanding of an eight year old.

Like most people’s understanding of what is pornography or art there is no formal definition that I want to use to back up these goals. I mean them in the way that generally informed people would gauge the performance of an AI system after extended interaction with it, and assumes that they would also have had extended interactions with children of the appropriate age.

These goals are not meant to be defined by “performance tests” that children or an AI system might take. They are meant as unambiguous levels of competence. The confusion between performance and competence was my third deadly sin in my recent post about the mistakes people make in understanding how far along we are with Artificial Intelligence.

If we are going to make real progress towards super, or just every day general, Artificial Intelligence then I think it is imperative that we concentrate on general competence in areas rather than flashy hype bait worthy performances.

Down with performance as a measure, I say, and up with the much fuzzier notion of competence as a measure of whether we are making progress.

So what sort of competence are we talking about for each of these for cases?

2 year old Object Recognition competence. A two year old already has color constancy, and can describe things by at least a few color words. But much more than this they can handle object classes, mapping what they see visually to function.

A two year old child can know that something is deliberately meant to function as a chair even if it is unlike any chair they have seen before. It can have a different number of legs, it can be made of different material, its legs can be shaped very oddly, it can even be a toy chair meant for dolls. A two year old child is not fazed by this at all. Despite having no visual features in common with any other chair the child has ever seen before the child can declare a new chair to be a chair. This is completely different from how a neural network is able to classify things visually.

But more than that, even, a child can see something that is not designed to function as a chair, and can assess whether the object, or location can be used as a chair. The can see a rock and decide that it can be sat upon, or look for a better place where there is something that will functionally act as a seat back.

So two year old children have sophisticated understandings of classes of objects. Once, while I was giving a public talk, a mother felt compelled to leave with her small child who was making a bit of a noisy fuss. I called her back and asked her how old the child was. “Two” came the reply. Perfect for the part of the talk I was just getting to. Live, with the audience watching I made peace with the little girl and asked if she could come up on stage with me. Then I pulled out my key ring, telling the audience that this child would be able to recognize the class of a particular object that she had never seen before. Then I held up one key and asked the two year old girl what it was. She looked at me with puzzlement. Then said, with a little bit of scorn in her voice, “a key”, as though I was an idiot for not knowing what it was. The audience loved it, and the young girl was cheered by their enthusiastic reaction to her!

But wait, there is more! A two year old can do one-shot visual learning from multiple different sources. Suppose a two year old has never been exposed to a giraffe in any way at all. Then seeing just one of a hand drawn picture of a giraffe, a photo of a giraffe, a stuffed toy giraffe, a movie of a giraffe, or seeing one in person for just a few seconds, will forever lock the concept of a giraffe into that two year old’s mind. That child will forever be able to recognize a giraffe as a giraffe, whatever form it is represented in. Most people have never seen a live giraffe, and none have ever seen a live dinosaur, but the are easy for anyone to recognize.

Try that, Deep Learning.  One example, in one form!

4 year old Language Understanding competence. Most four year old children can not read or write, but they can certainly talk and listen. They well understand the give and take of vocal turn-taking, know when they are interrupting, and know when someone is interrupting them. They understand and use prosody to great effect, along with animation of their faces, heads and whole bodies. Likewise they read these same cues from other speakers, and make good use of both projecting and detecting gaze direction in conversations amongst multiple people, perhaps as side conversations occur.

Four year old children understand when they are in conversation with someone, and (usually) when that conversation has ended, or the participants have changed. If there are three of four people in a conversation they do not need to name who they are delivering remarks to, nor to hear their name at the beginning of an utterance in order to understand when a particular remark is directed at them–they use all the non-spoken parts of communication to make the right inferences.

All of this is very different from today’s speech with agents such as the Amazon Echo, or Google Home. It is also different in that a four year old child can carry the context generated by many minutes of conversation. They can understand incomplete sentences, and can generate short meaningful interjections of just a word or two that make sense in context and push forward everyone’s mutual understanding.

A four year old child, like the remarkable progress in computer speech understanding over the last five years due to Deep Learning, can pick out speech in noisy environments, tuning out background noise and concentrating on speech directed at them, or just what they want to hear from another ongoing conversation not directed at them. They can handle strong accents that they have never heard before and still extract accurate meaning in discussions with another person.

They can deduce gender and age from the speech patterns of another, and they are finely attuned to someone they know speaking differently than usual. They can understand shouted, whispered, and sung speech. They themselves can sing, whisper and shout, and often do so appropriately.

And they are skilled in the complexity of sentences that they can handle. They understand many subtleties of tense, they can talk in and understand hypotheticals. Then can engage in and understand nonsense talk, and weave a pattern of understanding through it. They know when the are lying, and can work to hide that fact in their speech patterns.

They are so much more language capable than any of our AI systems, symbolic or neural.

6 year old Manual Dexterity competence. A six year old child, unless some super prodigy, is not able to play Chopin on the piano. But they are able to do remarkable feats of manipulation, with their still tiny hands, that no robot can do. When they see an object for the first time they fairly reliably estimate whether they can pick it up one handed, two handed, or two arms and whole body (using their stomach or chests as an additional anchor region), or not at all. For a one handed grasp they preshape their hand as they reach towards it having decided ahead of time what sort of grasp they are going to use. I’m pretty sure that a six old can do all these human grasps:

[I do not know the provenance of this image–I found it at a drawing web site here.] A six year old can turn on faucets, tie shoe laces, write legibly, open windows, raise and lower blinds if they are not too heavy, and they can use chopsticks in order to eat, even with non-rigid food. They are quite dexterous. With a little instruction they can cut vegetables, wipe down table tops, open and close food containers, open and close closets, and lift stacks of flat things into and out of those closets.

Six year old children can manipulate their non-rigid clothes, fold them, though not as well as skilled adult (I am not a skilled adult in this regard…), manipulate them enough to put them on and off themselves, and their dolls.

Furthermore, they can safely pick up a cat and even a moderately sized dog, and often are quite adept and trustworthy picking up their very young siblings. They can caress their grandparents.

They can wipe their bums without making a mess (most of the time).

ECW will most likely need to be able to do all these things, with scaled up masses (e.g., lifting or dressing a full sized adult which is beyond the strength capabilities of a six year old child).

We do not have any robots today that can do any of these things in the general case where a robot can be placed in a new environment with new instances of objects that have not been seen before, and do any of these tasks.

Going after these levels of manipulation skill will result in robots backed by new forms of AI that can do the manual tasks that we expect of humans, and that will be necessary for giving care to other humans.

8 year old Social Understanding competence. By age eight children are able to articulate their own beliefs, desires, and intentions, at least about concrete things in the world. They are also able to understand that other people may have different beliefs, desires, and intentions, and when asked the right questions can articulate that too.

Furthermore, they can reason about what they believe versus what another person might believe and articulate that divergence. A particular test for this is known as the “false-belief task”. There are many variations on this, but essentially what happens is that an experimenter lets a child see a person make an observation of a person seeing that Box A contains, say, a toy elephant, and that Box B is empty. That person leaves the room, and the experimenter then, in full sight of the child moves the toy elephant to Box B. They then ask the child which box contains the toy elephant, and of course the child says Box B. But the crucial question is to ask the child where the person who left the room will look for the toy elephant when they are asked to find it after they have come back into the room. Once the child is old enough (and there are many experiments and variations here) they are able to tell the experimenter that the person will look in Box A, knowing that is based on a belief the person has which is now factually false.

There is a vast literature on this and many other aspects of understanding other people, and also a vast literature on testing such knowledge for very young children but also for chimpanzees, dogs, birds, and other animals on what they might understand–without the availability of language these experiments can be very hard to design.

And there are many many aspects of social understanding, including inferring a person’s desire or intent from their actions, and understanding why they may have those desires and intents. Some psychological disorders are manifestations of not being able to make such inferences. But in our normal social environment we assume a functional capability in many of these areas about others with whom we are interacting. We don’t feel the need to explain certain things to others as surely they will know from what they are observing. And we also observe the flow of knowledge ourselves and are able to make helpful suggestions as we see people acting in the world. We do this all the time, pointing to things, saying “over there”, or otherwise being helpful, even to complete strangers.

Social understanding is the juice that makes us humans into a coherent whole. And, we have versions of social understanding for our pets, but not for our plants. Eight year old children have enough of it for much of every day life.

Improvement in Competence will lead the way

These competencies of two, four, six, and eight year old children will all come into play for ECW and SLP. Without these competencies, our intelligent systems will never seem natural or as intelligent as us. With these competencies, whether they are implemented in ways copied from humans or not (birds vs airplanes) our intelligent systems will have a shot at appearing as intelligent as us. They are crucial for an Artificial Generally Intelligent system, or for anything that we will be willing to ascribe Super Intelligence to.

So, let’s make progress, real progress, not simple hype bait, on all four of these systems level goals. And then, for really the first time in sixty years we will actually be part ways towards machines with human level intelligence and competence.

In reality it will just be a small part of the way, and even less of the way to towards Super Intelligence.

It turns out that constructing deities is really really hard. Even when they are in our own image.



1 Innateness, AlphaZero, and Artificial Intelligence“, Gary Marcus, submitted to arXiv, January 2018.

[FoR&AI] Steps Toward Super Intelligence III, Hard Things Today

rodneybrooks.com/forai-steps-toward-super-intelligence-iii-hard-things-today/

[This is the third part of a four part essay–here is Part I.]

If we are going to develop an Artificial Intelligence system as good as a human, an ECW or SLP say, from Part II of this essay, and if we want to get beyond that, we need to understand what current AI can hardly do at all. That will tell us where we need to put research effort, and where that will lead to progress towards our Super Intelligence.

The seven capabilities that I have selected below start out as concrete, but get fuzzier and fuzzier and more speculative as we proceed. It is relatively easy to see the things that are close to where we are today and can be recognized as things we need to work on. When those problems get more and more solved we will be living in different intellectual world than we do today, dependent on the outcomes of that early work. So we can only speak with conviction about the short term problems where we might make progress.

And by short term, I mean the things we have already been working on for forty plus years, sometimes sixty years already.

And there are lots of other things in AI that are equally hard to do today. I just chose seven to give some range to my assertion that there is lots to do.

1. Real perception

Deep Learning brought fantastic advances to image labeling. Many people seem to think that computer vision is now a solved problem. But that is nowhere near the truth.

Below is a picture of Senator Tom Carper, ranking member of the U.S. Senate Committee  on Environment and Public Works, at a committee hearing held on the morning of Wednesday June 13th, 2018, concerning the effect of emerging autonomous driving technologies on America’s roads and bridges.

He is showing what is now a well known particular failure of a particular Deep Learning trained vision system for an autonomous car. The stop sign in the left has a few carefully placed marks on it, made from white and black tape. The system no longer identifies it as a stop sign, but instead thinks that is a forty five mile per hour speed limit sign. If you squint enough you can sort of see the essence of a “4” at the bottom of the “S” and the “T”, and sort of see the essence of a “5” at the bottom of the “O” and the “P”.

But really how could a vision system that is good enough to drive a car around some of the time ever get this so wrong? Stop signs are red! Speed limit signs are not red. Surely it can see the difference between signs that are red and signs that are not red?

Well, no. We think redness of a stop sign is an obvious salient feature because our vision systems have evolved to be able to detect color constancy. Under different lighting conditions the same object in the world reflects different colored light–if we just zoom in on a pixel of something that “is red”, it may not have a red value in the image from the camera. Instead our vision system uses all sorts of cues, including detecting shadows, knowing things about what color a particular object “should” be, and local geometric relationships between measured colors in order for our brain to come up with a “detected color”. This may be very different from the color that we get from simply looking at the red/green/blue values of pixels in a camera image.

The data sets that are used to train Deep Learning systems do not have detailed color labels for little patches of the image. And the computations for color constancy are quite complex, so they are not something that the Deep Learning systems simply stumble upon.

Look at the synthetic image of a 5\times5 checker below, produced by Professor Ted Adelson at MIT. We can see it is and say it is a checkerboard because it is made up of squares that alternate between black and white, or at least relatively darker and lighter. But wait, they are not squares in the image at all.  They are squished. Our brain is extracting three dimensional structure from this two dimensional image, and guessing that it is really a flat plane of squares that is at a non-orthogonal angle to our line of sight–that explains the consistent pattern of squishing we see. But wait, there is more. Look closely at the two squished squares that are marked “same” in this image. One is surely black and one is surely white. Our brains will not let us see the truth, however, so I have done it for your brain.

Here I grabbed a little piece of image from the top (black) square on the left and the bottom (white) square in the middle.

            

In isolation neither is clearly black nor white. Our vision system sees a shadow being cast by the green cylinder and so lightens up our perception of the one we see as a white square. And it is surrounded by even darker pixels in the shadowed black squares, so that adds to the effect. The third patch above is from the black square between the two labeled as the same and is from the part of that square which falls in the shadow. If you still don’t believe me print out the image and then cover up all but the regions inside the two squares in question. They will then pop into being the same shade of grey.

For more examples like this see the blue (but red) strawberries from my post last year on what is it like to be a robot?.

This is just one of perhaps a hundred little (or big) tricks that our perceptual system has built for us over evolutionary time scales. Another one is extracting prosody from people’s voices, compensating automatically for background noise, our personal knowledge of that person and their speech patterns, and more generally from simply knowing their gender, age, what their native language is, and perhaps knowing where they grew up. It is effortless for us, but it is something that lets us operate in the world with other people, and limits the extent of our stupid social errors. Another is how we are able to estimate space from sound, even when listening over a monaural telephone channel–we can tell when someone is in a large empty building, when they are outside, when they are driving, when they are in wind, just from qualities of the sound as they speak. Yet another is how we can effortlessly recognize people a from picture of their face, less than 32 pixels on a side, including often a younger version of them that we never met, nor have seen in photos before. We are incredibly good at recognizing faces, and despite recent advances we are still better than our programs. The list goes on.

Until ECW and SLP have the same hundred or so tricks up their sleeves they are not going to understand the world in the way that we do, and that will be critically important as they are not going to be able to relate to our world in the way that we do, and so neither of them will be able to do their assigned tasks. They will come off as blatant doofuses. When doddering Rodney, struggling for a noun that he can’t retrieve, says to ECW “That red one, over there!” it will not do ECW much good unless it can map red to something that may not appear red at all in terms of pixels.

2. Real Manipulation

I can reach my hand into my pants pocket and pull out my car keys blindly and effortlessly. I am not letting a robot near my pants pocket any time soon.

Dexterous manipulation has turned out to be fiendishly hard, and making dexterous hands no easier. People always ask me what would it take to make significant progress. If I knew that I would have tried it long ago.

Soon after I arrived at the Stanford Artificial Intelligence Laboratory in 1977 I started programming a couple of robot arms. Below is a picture of the “Gold Arm”, one of the two that I programmed, in a display case at one of the entrances to the Computer Science Department building at Stanford. Notice the “hand”, parallel fingers that slide together and apart. That was all we had for hands back then.

And below is a robot hand that my company was selling forty years later, in 2017. It is the same fundamental mechanical design (a ball screw moving the two fingers of a parallel jaw gripper together and apart, with some soft material on the inside of the fingers (it has fallen off one finger in the 1977 robot above)). That is all we have now. Not much has happened practically with robot hands for the last four decades.

Beyond that, however, we can not make our robot hands perform anywhere near the tasks that a human can do. In fulfillment centers, the places that pack our orders for online commerce, the movement to a single location of all the items to be packed for a given order has been largely solved. Robots bring shelves full of different items to one location. But then a human has to pick the correct item off of each shelf, and a human has to pack them into a shipping box, deciding what packing material makes sense for the particular order. The picking and packing has not been solved by automation. Despite the fact that there is economic motivation, as there was for turning lead into gold, that is pushing lots of research into this area.

Even more so the problem of manipulating floppy materials, like fabrics for apparel manufacture, or meat to be carved, or humans to be put to bed, has had very little progress. Our robots just can not do this stuff. That is alright for SLP but a big problem for ECW.

By the way, I always grimace when I see a new robot hand being showed off by researchers, and rather than being on the end of a robot arm, the wrist of the robot hand is in the hands of a human who is moving the robot hand around. You have probably used a reach grabber, or seen someone else use one. Here is a random image of one that I grabbed (with my mouse!) off an e-commerce website:

If you have played around with one of these, with its simple plastic two fingers and only one grasping motion, you will have been much more dexterous than any robot hand in the history of robotics. So even with this simple gripper, and a human brain behind it, and with no sense of touch on the distal fingers, we get to see how far off we are with robot grasping and manipulation.

3. Read a Book

Humans communicate skills and knowledge through books and more recently through “how to” videos. Although you will find recent claims that various “robots”, or AI systems can learn from a video or from reading a book, none of these demonstrations have the level of capability of a child, and  the approaches people are taking are not likely to generalize to human level competence. We will come back to this point shortly.

But in the meantime, here is what an AI system would need to be able to do if it were to have human level competence at reading books in general. Or truly learn skills from watching a video.

Books are not written as mathematical proofs where all the steps are included. Actually mathematical proofs are not written that way either. We humans fill in countless steps as we read, incorporating our background knowledge into the understanding process.

Why does this work? It is because humans wrote the books and implicitly know what background information all human readers will have. So they write with the assumption that they understand what the humans reading the book will have as background knowledge. So surely an AI system reading a book will need to have that same background.

“Hold on”, the machine learning “airplanes not birds” fanboys say! We should expect Super Intelligences to read books written for Super Intelligences, not those ones written for measly humans. But that claim, of course, has two problems. First, if it really is a Super Intelligence it should be able to understand what mere humans can understand. Second, we need to get there from here, so somehow we are going to have to bootstrap our Super progeny, and the ones writing the books for the really Super ones will first need to learn from books written for measly humans.

But now, back to this background knowledge. It is what we all know about the world and can expect one another to know about the world. For instance, I don’t feel the need to explain to you right now, dear reader, that the universe of intelligent readers and discussants of ideas on Earth at this moment are all members of the biological species Homo Sapiens. I figure you already know that.

This could be called “common sense” knowledge. It is necessary for so much of our (us humans) understanding of the world, and it is an assumed background in all communications between humans. Not only that, it is an enabler of how we make plans of action.

Two NYU professors, Ernest Davis (computer science) and Gary Marcus (psychology and neural science) have recently been highlighting just how much humans rely on common sense to understand the world, and what is missing from computers. Besides their recent opinion piece in the New York Times on Google Duplex they also had a long article2 about common sense in a popular computer science magazine. Here is the abstract:

Who is taller, Prince William or his baby son Prince George? Can you make a salad out of a polyester shirt? If you stick a pin into a carrot, does it make a hole in the carrot or in the pin? These types of questions may seem silly, but many intelligent tasks, such as understanding texts, computer vision, planning, and scientific reasoning require the same kinds of real-world knowledge and reasoning abilities. For instance, if you see a six-foot-tall person holding a two-foot-tall person in his arms, and you are told they are father and son, you do not have to ask which is which. If you need to make a salad for dinner and are out of lettuce, you do not waste time considering improvising by taking a shirt out of the closet and cutting it up. If you read the text, “I stuck a pin in a carrot; when I pulled the pin out, it had a hole,” you need not consider the possibility “it” refers to the pin.

As they point out, so called “common sense” is important for even the most mundane tasks we wish our AI systems to do for us. They enable both Google and Bing to do this translation: “The telephone is working. The electrician is working.” in English, becomes “Das Telefon funktioniert. Der Elektriker arbeitet.” in German. The two meanings of “working” in English need to be handled differently in German, and an electrician works in one sense, whereas a telephone works in another sense. Without this common sense, somehow embedded in an AI system, it is not going to be able to truly understand a book. But this example is only a tiny one step version of common sense.

Correctly translating even 20 or 30 words can require a complex composition of little common sense atoms. Douglas Hofstadter pointed out in a recent Atlantic article places where things in short order can get just too complicated for Google translate, despite deep learning have enabled the process. In his examples it is context over many sentences that get the systems into trouble. Humans handle these cases effortlessly.  Even four years olds (see Part IV of this post).

He says, when comparing how he translates to how Google translates:

Google Translate is all about bypassing or circumventing the act of understanding language.

I am not, in short, moving straight from words and phrases in Language A to words and phrases in Language B. Instead, I am unconsciously conjuring up images, scenes, and ideas, dredging up experiences I myself have had (or have read about, or seen in movies, or heard from friends), and only when this nonverbal, imagistic, experiential, mental “halo” has been realized—only when the elusive bubble of meaning is floating in my brain—do I start the process of formulating words and phrases in the target language, and then revising, revising, and revising.

In the second paragraph he touches on the idea of gaining meaning from running simulations of scenes in his head. We will come back to this in the next item of hardness for AI.  And elsewhere in the article he even points out how when he is translating he uses Google search, a compositional method that Google translate does not have access to.

Common sense lets a program, or a human, prune away irrelevant considerations. A program may be able to exhaustively come up with many many options about what a phrase or a situation could mean, all the realms of possibility. What common sense can do is quickly reduce that large set to a much smaller set of plausibility, and beyond that narrow things down to those cases with significant probability. From possibility to plausibility to probability. When my kids were young they used to love to tease dad by arguing for possibilities as explanations for what was happening in the world, and tie me into knots as I tried to push back with plausibilities and probabilities. It was a great game.

This common sense has been a a long standing goal for symbolic artificial intelligence. Recently the more rabid Deep Learners have claimed that their systems are able to learn aspects of common sense, and that is sometimes a little bit true. But unfortunately it does not come out in a way that is compositional–it usually requires a human to interpret the result of an image or a little movie that the network generates in order for the researchers to demonstrate that it is common sense. The onus, once again is on the human interpreter. Without composition, it is not likely to be as useful or as robust as the human capabilities we see in quite small children.

The point here that simply reading a book is very hard, and requires a lot of what many people have called “common sense”. How that common sense should be engendered in our AI systems is a complex question that we will return to in Part IV.

Now back to claims that we already have AI systems that can read books.

Not too long ago an AI program outperformed MIT undergraduates on the exam for Freshman calculus. Some might think that that means that soon AI programs will be doing better on more and more classes at MIT and that before too long we’ll have an AI program fulfilling the degree requirements at MIT. I am confident that it will take more than fifty years. Supremely confident, and not just because an MIT undergraduate degree requires that each student pass a swimming test. No, I am supremely confident on that time scale because the program, written by Jim Slagle1 for his PhD thesis with Marvin Minsky, outperformed MIT students in 1961. 1961! That is fifty seven years ago already. Mainframe computers back then were way less than what we have now in programmable light switches or in our car key frob. But an AI program could beat MIT undergraduates at calculus back then.

When you see an AI program touted as having done well on a Japanese college entrance exam, or passing a US 8th grade science test, please do not think that the AI is anywhere near human level and going to plow through the next few tests. Again this is one of the seven deadly sins of mistaking performance on a narrow task, taking the test, for competence at a general level. A human who passes those tests does it in a human way that means that they have a general competence around the topics in the test. The test was designed for humans and inherent in the way it is designed it extracts information about the competence of a human who took the test. And the test designers did not even have to think about it that way. It is just they way they know how to design tests. (Although we have seen how “teaching to the test” degrades that certainty even for human students, which is why any human testing regime eventually needs to get updated or changed completely.) But that test is not testing the same thing for an AI system. Just like a stop sign with a few pieces of tape on it may not look at all like a stop sign to a Deep Learning system that is supposed to drive your car.

At the same time the researchers, and their institutional press offices, are committing another of the seven deadly sins. They are trying to demonstrate that there system is able to “read” or “understand” by demonstrating preformance on a human test (despite my argument above that the tests are not valid for machines), and then they claim victory and let the press grossly overgeneralize.

4. Diagnose and Repair Stuff

If ECW is going to be a useful elder care robot in a home it out to be able to figure out when something has gone wrong with the house. At the very least it should be able to know which specialist to call to come and fix it. If all it can do is say “something is wrong, something is wrong, I don’t know what”, we will hardly think of it as Super Intelligent. At the very least it should be able to notice that the toilet is not flushing so the toilet repair person should be called. Or that a light bulb is out so that the handy person should be called. Or that there is no electricity at all in the house so that should be reported to the power company.

We have no robots that could begin to do these simple diagnosis tasks. In fact I don’t know of any robot that would realize when the roof had blown off a house that they were in and be able to report that fact. At best today we could expect a robot to detect that environmental conditions were anomalous and shut themselves down. But in reality I think it is more likely that they would continue trying to operate (as a Roomba might after it has run over a dog turd with its rapidly spinning brushes–bad…) and fail spectacularly.

But more than what we referred to as common sense in the previous section, it seems that when humans diagnose even simply problems they are running some sort of simulation of the world in their heads, looking at possibilities, plausibilities, and probabilities. It is not exact the 3D accurate models that traditional robotics uses to predict the forces that will be felt as a robot arm moves along a particular trajectory (and thereby notice when it has hit something unexpected and the predictions are not borne out by the sensors). It is much sloppier than that, although geometry may often be involved. And it is not the simulation as a 2D movie that some recent papers in Deep Learning suggest is the key, but instead is very compositional across domains. And it often uses metaphor. This simulation capability will be essential for ECW to provide full services as a trusted guardian of the home environment for an elderly person. And SLP will need such generic simulations to check out how its design for people flow will work in its design of the dialysis ward.

Again, our AI systems and robots may not have to do things exactly the way we do them, but they will need to have the same general competence as, or more than, humans if we are going to think of them as being as smart as us.

Right now there are really no systems that have either common sense or this general purpose simulation capability. That is not to say that people have not worked on these problems for a long long time.  I was very impressed by a paper on this topic at the very first AI conference I ever went to, IJCAI 77 held at MIT in August 1977.  The paper was by Brian Funt, and was WHISPER: A Problem-Solving System Utilizing Diagrams and a Parallel Processing Retina. Funt was a post doc at Stanford with John McCarthy, the namer of Artificial Intelligence and the instigator of the foundational 1956 workshop at Dartmouth. And McCarthy’s first paper on “Programs with Common Sense” was written in 1958. We have known these problems are important for a long long time. People have made lots of worthwhile progress on them over the last few decades.They still remain hard and unsolved and not read for prime time deployment in real products.

“But wait”, you say.  You have seen a news release about a robot building a piece of IKEA furniture. Surely that requires common sense and this general purpose simulation. Surely it is already solved and Super Intelligence is right around the corner. Again, don’t hold your breath–fifty years is a long time for a human to go without oxygen. When you see such a demo it is with a robot and a program that has been worked on by many graduate students for many months. The pieces were removed from the boxes by the graduate students (months ago). They have run the programs again, and again, and again, and finally may have one run where it puts some parts of the furniture together. The students were all there, all making sure everything went perfectly. This is completely different from what we might expect from ECW, taking delivery of some IKEA boxes at the door, carrying them inside (with no graduate students present), opening the boxes and taking out the famous IKEA instructions and reading them. And then putting the furniture together.

It would be very helpful if ECW could do these things. Any robot today put in this situation will fail dismally on many of the following steps (and remember, this is a robot in a house that the researchers have never seen).

  • realizing there is a delivery being made at the house
  • getting the stuff up any steps and inside
  • actually opening the boxes without knowing exactly what is inside and without damaging the parts
  • finding the instructions, and manipulating the paper to see each side of each page
  • understanding the instructions
  • planning out where to place the pieces so that they are available in the right order
  • manipulating two or three pieces at once when they need to be joined
  • finding and retrieving the right tools (screwdrivers, hammers to tap in wooden dowels)
  • doing that finely skilled manipulation

Not one of these subtasks can today be done by a robot in some unknown house with a never before seen piece of IKEA furniture, and without a team of graduate students having worked for month on the particular instance of that subtask in the particular environment.

When academic researchers say they have solved a problem, or demonstrated a robot capability that is a long long way from the level of performance we will expect from ECW.

Here is a little part of a short paper that just came out3 in the AAAI’s (Association for the Advancement of Artificial Intelligence) AI Magazine this summer, written by Alexander Kleiner about his transition from being an AI professor to working in AI systems that had to work in the real world, every day, every time.

After I left academe in 2014, I joined the technical organization at iRobot. I quickly learned how challenging it is to build deliberative robotic systems exposed to millions of individual homes. In contrast, the research results presented in papers (including mine) were mostly limited to a handful of environments that served as a proof of concept.

Academic demonstrations are important steps towards solving these problems. But they are demonstrations only. Brian Funt demonstrated a program that could imagine the future few seconds, forty one years ago, before computer graphics existed (his 1977 paper uses line printer output of fixed width characters to produce diagrams). That was a good early step. But despite the decades of hard work we are still not there yet, by a long way.

5. Relating Human and Robot Behavior to Maps

As I pointed out in my  what is it like to be a robot? post, our home robots will be able to have a much richer set of sensors than we do. For instance they can have built in GPS, listen for Bluetooth and Wifi, and measure people’s breathing and heartbeat a room away4 by looking for subtle changes in Wifi signals propagating through the air.

Our self-driving cars (such as they are really self driving yet) rely heavily on GPS for navigation. But GPS now gets spoofed as a method of attack, and worse, some players may decide to bring down one of more GPS satellites in a state sponsored act of terrorism.

Things will be really bad for a while if GPS goes down. For one thing the electrical grid will need to be partitioned into much more local supplies as GPS is used to synchronize the phase of AC current in distant parts of the network. And humans will be lost quite a bit until paper maps once again get printed for all sorts of applications. E-commerce deliveries will be particularly badly hit for a while, as well as flight and boat navigation (early 747’s had a window in the roof of the cockpit for celestial navigation across the Pacific; the US Naval Academy brought back into its curriculum navigation by the stars in 2016).

Whether it is spoofing, an attack on satellites, or just lousy reception, we would hope that our elder care robots, our ECWs, are not taken offline. They will be, unless they get much better at visual and other navigation without relying at all on hints from GPS. This will also enable them to work in rapidly changing environments where maps may not be consistent from one day to the next, nor necessarily be available.

But this is just the start. Maps, including terrain and 3D details will be vital for ECW to be able to decide where it can get its owner to walk, travel in a wheel chair, or move within a bathroom. This capability is not so hard for current traditional robotics approaches. But for SLP, the Services Logistics Planner, it will need to be a lot more generic. It will need to relate 3D maps that it builds in its plans for a dialysis ward to how a hypothetical human patient, or a group of hypothetical staff and patients, will together and apart navigate around the planned environment. It will need to build simulations, by itself, with no human input, of how groups of humans might operate.

This capability, of projecting actions through imagined physical spaces is not too far off from what happens in video games. It does not seem as far away as all the other items in this blog post. It still requires some years of hard work to make systems which are robust, and which can be used with no human priming–that part is far away from any current academic demonstrations.

Furthermore, being able to run such simulations will probably contribute to aspects of “common sense”, but it all has to be much more compositional than the current graphics of video games, and much more able to run with both plausibility and probability, rather than just possibility.

This is not unlike the previous section on diagnosis and repair, and indeed there is much commonality. But here we are pushing deeper on relating the three dimensional aspects of the simulation to reality in the world. For ECW it will the actual world as it is. For SLP it will be the world as it is designing it, for the future dialysis ward, and constraints will need to flow in both directions so that after a simulation, the failures to meet specifications or desired outcomes can be fed back into the system.

6.Write Or Debug a Computer Program

OK, I admit I am having a little fun with this section, although it is illustrative of human capabilities and forms of intelligence. But feel free to skip it, it is long and a little technical.

Some of the alarmists about Super Intelligence worry that when we have it, it will be able to improve itself by rewriting its own code. And then it will exponentially grow smarter than us, and so, naturally, it will kill us all. I admit to finding that last part perplexing, but be that as it may.

You may have seen headlines like “Learning Software Learns to Write Learning Software”. No it doesn’t. In this particular case there was a fixed human written algorithm that went through a process of building a particular form of Deep Learning network. And a learning network that learned how to adjust the parameters of that algorithm which ended up determining the size, connectivity, and number of layers. It didn’t write a single line of computer code.

So, how do we find our way through such a hyped up environment and how far away are we from AI systems which can read computer code, debug it, make it better, and write new computer code? Spoiler alert: about as far away as it is possible to be, like not even in the same galaxy, let alone as close as orbiting hundreds of millions of miles apart in the same solar system.

Each of today’s AI systems are many millions of lines of code, they have been written by many, many, people through shared libraries, along with, for companies delivering AI based systems, perhaps a few million lines of custom and private code bases. They usually span many languages such as C, C++, Python, Javascript, Java, and others. The languages used often have only informal specifications, and in the last few years new languages have been introduced with alarming frequency and different versions of the languages have different semantics. It’s all a bit of a mess, to everyone except the programmers whose lives these details are.

On top of this we have known since Turing introduced the halting problem in 1936 that it is not possible for computers to know certain rather straightforward things about how any given program might perform over all possible inputs. In 1967 Minsky warned that even for computers with relatively small amounts of memory (about what we expect in a current car key frob) that to figure out some things about their programs would take longer then the life of the Universe, even with all the Universe doing the computing in parallel.

Humans are able to write programs with some small amount of assuredness that they will work by using heuristics in analyzing what the program might do. They use various models and experiments and mental simulations to prove to themselves that their program is doing what they want. This is different from proof.

When computers were first developed we first needed computer software. We quickly went from programmers having to enter the numeric codes for each operation of the machine, to assemblers where there is a one to one correspondence between what the programmers write and that numeric code, but at least they get to write it in human readable text, like “ADD”. Then quickly after that came compilers where the language expressing the computation was at a higher level model of an abstract machine and the compiler turned that into the assembler language for any particular machine. There have been many attempts, really since the 1960s, to build AI systems which are a level above that, and can generate code in a computer language from a higher level description, in English say.

The reality is that these systems can only generate fairly generic code and have difficulty when complex logic is needed. The proponents of these systems will argue about how useful they are, but the reality is that the human doing the specifying has to move from specifying complex computer code to specifying complex mathematical relationships.

Real programmers tend to use spatial models and their “simulating the world” capabilities to reason through what code should be produced, and which cases should handled in which way. Often they will write long lists of cases, in pseudo English so that they can keep track of things, and (if the later person who is to maintain the code is lucky) put that in comments around the code. And they use variable names and procedure names that are descriptive of what is going to be computed, even though that makes no difference to the compiler. For instance they might use StringPtr for a pointer to a string, where the compiler would have been just as happy if they had used M, say. Humans use the name to give themselves a little help in remembering what is what.

People have also attempted to write AI systems to debug programs, but they rarely try to understand the variable names, and simply treat them as anonymous symbols, just as the compiler does.

An upshot of this has been “formal” programming methods which require humans to write mathematical assertions about their code, so that automated systems can have a chance at understanding it. But this is even more painful that writing computer code, and even more buggy than regular computer code, and so it is hardly ever done.

So our Super Intelligence is going to deal with existing code bases, and some of the stuff in there will be quite ugly.

Just for fun I coded up a little library routine in C–I use a library routine with the exact same semantics in another language that I regularly program in. And then I got rid of all the semantics in the variable, procedure and type names. Here is the code.  It is really only one line. And, it compiles just fine using the GCC compiler and works completely correctly.

a*b(a*c) {a*d; a*e;
  for(d=NULL;c!=NULL;e=(a*)*c,*c=(a)d,d=c,c=e);return d;}

I sent it to two of my colleagues who are used to groveling around in build systems and open source code in libraries, asking if they could figure out what it was. I had made it a little hard by not given them a definition of “a”.

They both figured out immediately that “a” must be a defined type. One replied that he had some clues, and started out drawing data structures and simulating the code, but then moved to experimenting by compiling it (after guessing at a definition for “a”) and writing a program that called it. He got lots of segment violations (i.e., the program kept crashing), but guessed that it was walking down a linked list. The second person said that he stared at the code and realized that “e” was a temporary variable whose use was wrapped around assignments of two others which suggested some value swapping going on. And that the end condition for the loop being when “c” became NULL, suggested to him that it was walking down a list “c”, but that list itself was getting destroyed. So he guessed it might be doing an in place list reversal, and was able to set up a simulation in his head and on paper of that and verify that it was the case.

When I gave each of them the equivalent and original form of the code with the  informative names (though I admit to a little bit of old fashioned use of equivalences in the type definition) restored, along with the type definition for “a”, now called “address”, they both said it was straightforward to simulate on paper and verify what was going on.

#define address unsigned long long int

address *reverse(address *list) {
 address *rev;
 address *temp;
 for(rev=NULL;list!=NULL;temp=(address *)*list,
                         *list=(address)rev,
                         rev=list,list=temp);
 return rev;}

The reality is that variable names and comments, though irrelevant to the actual operation of code is where a lot of the semantic explanation of what is going on is encoded. Simply looking at the code itself is unlikely to give enough information about how it is used. And if you look at the total system then any sort of reasoning process about it soon becomes intractable.

If anyone had already built an AI system which could understand either of the two versions of my procedure above it would be an unbelievably useful tool for  every programmer alive today. That is what makes me confident we have nothing that is close–it would be in everyone’s IDE (Integrated Development Environment) and programmer productivity would be through the roof.

But you might think my little exercise was a bit too hard for our poor Super Intelligence (the one whose proponents think will be wanting to kill us all in just a few years–poor Super Intelligence). But really you should not underestimate how badly written are the code bases on which we all rely for our daily life to proceed in an ordered way.

So I did a different, second experiment, this time just on myself.

Here is a piece of code I just found on my Macintosh, under a directory named TextEdit, in a file named EncodingManager.m. I wasn’t sure what a file extension of “.m” meant in terms of language, but it looked like C code to me. I looked only at this single procedure within that file, nothing else at all, but I can tell a few things about it, and the general system of which it is part. Note that the only words here that are predefined in C are static, int, const, void, if, and return. Everything else must be defined somewhere else in the program, but I didn’t look for the definitions, just stared at this little piece of code in isolation. I guarantee that there is no AI program today which could deduce what I did, in just a few minutes, in the italic text following the code.

/* Sort using the equivalent Mac encoding as the major key. Secondary key is the actual encoding value, which works well enough. We treat Unicode encodings as special case, putting them at top of the list.

*/

static int encodingCompare(const void *firstPtr, const void *secondPtr) {

    CFStringEncoding first = *(CFStringEncoding *)firstPtr;

    CFStringEncoding second = *(CFStringEncoding *)secondPtr;

    CFStringEncoding macEncodingForFirst = CFStringGetMostCompatibleMacStringEncoding(first);

    CFStringEncoding macEncodingForSecond = CFStringGetMostCompatibleMacStringEncoding(second);

    if (first == second) return 0; // Should really never happen

    if (macEncodingForFirst == kCFStringEncodingUnicode || macEncodingForSecond == kCFStringEncodingUnicode) {

        if (macEncodingForSecond == macEncodingForFirst) return (first > second) ? 1 : -1; // Both Unicode; compare second order

        return (macEncodingForFirst == kCFStringEncodingUnicode) ? -1 : 1; // First is Unicode

    }

    if ((macEncodingForFirst > macEncodingForSecond) || ((macEncodingForFirst == macEncodingForSecond) && (first > second))) return 1;

    return -1;

}

First, the comment at the top is slightly misleading as this is not a sort routine, rather it is a predicate which is used by some sorting procedure to decide whether any two given elements are in the right order. It takes two arguments and returns either 1 or -1, depending on which order they should be in the sorted output from that sorting procedure which we haven’t seen yet. We have to figure out what those two possibilities mean. I know that TextEdit is a simple text file editor that runs on the Macintosh. It looks like there are a bunch of possible encodings  for elements of strings inside TextEdit, and on the Macintosh there are a non-identical set of possible encodings. I’m guessing that TextEdit must run on other systems too! This particular predicate takes the encoding values for the general encodings and says which of the ones closest to each of them on the Macintosh is better to use. And it prefers encodings where only a single byte per character is used. The encodings themselves, both for the general case, and for the Macintosh are represented by an integer. Based on the third sentence in the first comment, and on the return value where the comment is “First is Unicode” it looks like this predicate returning -1 means its first argument should precede (i.e., appear closer to the “top of the list”–an inference I am making from “top” being used to refer to the end of a list that precedes all the other elements of the list; whether it is actually represented elsewhere as a classical list as in my first example of code above, or it is a sorted array is immaterial and this piece of code does not depend on that) the second argument in the sort, otherwise if it returns 1, then the second argument should precede the first argument. If the integer for the Macintosh encoding is smaller that means it should come first, and if they are equal for the Macintosh, the whether the integer representing the general case encoding is smaller should determine the order. All this subject to single byte representations always winning out.

That is a lot of things to infer about what is actually a pretty short piece of code. But it is the sort of thing that makes it so that humans can build complex systems, in the way that all our current software is built.

It is the sort of thing that any Super Intelligence bent on self improvement through code level introspection is going to need in order to understand the code that has been cobbled together by humans to produce it. Without understanding its own code it will not be able to improve itself by rewriting its own code.

And we do not have any AI system which can understand even this tiny, tiny little bit of code from a simple text editor.

7. Bond With Humans

Now we get to the really speculative place, as this sort of thing has only been worked in AI and robotics for around 25 years. Can humans interact with robots in a way in which they have true empathy for each other?

In the 1990’s my PhD student Cynthia Breazeal used to ask whether we would want the then future robots in our homes to be “an appliance or a friend”. So far they have been appliances. For Cynthia’s PhD thesis (defended in the year 2000) she built a robot, Kismet, an embodied head, that could interact with people. She tested it with lab members who were familiar with robots and with dozens of volunteers who had no previous experience with robots, and certainly not a social robot like Kismet.

I have put two videos (cameras were much lower resolution back then) from her PhD defense online.

In the first one Cynthia asked six members of our lab group to variously praise the robot, get its attention, prohibit the robot, and soothe the robot. As you can see, the robot has simple facial expressions, and head motions. Cynthia had mapped out an emotional space for the robot and had it express its emotion state with these parameters controlling how it moved its head, its ears and its eyelids. A largely independent system controlled the direction of its eyes, designed to look like human eyes, with cameras behind each retina–its gaze direction is both emotional and functional in that gaze direction determines what it can see. It also looked for people’s eyes and made eye contact when appropriate, while generally picking up on motions in its field of view, and sometimes attending to those motions, based on a model of how humans seem to do so at the preconscious level. In the video Kismet easily picks up on the somewhat exaggerated prosody in the humans’ voices, and responds appropriately.

In the second video, a naïve subject, i.e., one who had no previous knowledge of the robot, was asked to “talk to the robot”. He did not know that the robot did not understand English, but instead only detected when he was speaking along with detecting the prosody in his voice (and in fact it was much better tuned to prosody in women’s voices–you may have noticed that all the human participants in the previous video were women). Also he did not know that Kismet only uttered nonsense words made up of English language phonemes but not actual English words. Nevertheless he is able to have a somewhat coherent conversation with the robot. They take turns in speaking (as with all subjects he adjusts his delay to match the timing that Kismet needed so they would not speak over each other), and he successfully shows it his watch, in that it looks right at his watch when he says “I want to show you my watch”. It does this because instinctively he moves his hand to the center of its visual field and makes a motion towards the watch, tapping the face with his index finger. Kismet knows nothing about watches but does know to follow simple motions. Kismet also makes eye contact with him, follows his face, and when it loses his face, the subject re-engages it with a hand motion. And when he gets close to Kismet’s face and Kismet pulls back he says “Am I too close?”.

Note that when this work was done most computers only ran at about 200Mhz, a tiny fraction of what they run at today, and with only about 1,000th of the size RAM we expect on even our laptops today.

One of the key takeaways from Cynthia’s work was that with just a few simple behaviors the robot was able to engage humans in human like interactions. At the time this was the antithesis of symbolic Artificial Intelligence which took the view that speech between humans was based on “speech acts” where one speaker is trying to convey meaning to another. That is the model that Amazon Echo and Google Home use today. Here it seemed that social interaction, involving speech was built on top of lower level cues on interaction. And furthermore that a human would engage with a physical robot if there were some simple and consistent cues given by the robot.

This was definitely a behavior-based approach to human speech interaction.

But is it possible to get beyond this? Are the studies correct that try to show an embodied robot is engaged with better by people than a disembodied graphics image, or a listening/speaking cylinder in the corner of the room?

Let’s look at the interspecies interaction that people engage in more than any others.

This photo was in a commentary in the issue of Science that published a paper5 by Nagasawa et al, in 2015. The authors show that as oxytocin concentration rises for whatever reason in a dog or its owner then the one with the newly higher level engages more in making eye contact. And then the oxytocin level in the other individual (dog or human) rises. They get into a positive feedback loop of oxytocin levels mediated by the external behavior of each in making sustained eye contact.

Cynthia Breazeal did not monitor the oxytocin levels in her human subjects as they made sustained eye contact with Kismet, but even without measuring it I am quite sure that the oxytocin level did not rise in the robot. The authors of the dog paper suggest that in their evolution, while domesticated, dogs stumbled upon a way to hijack an interaction pattern that is important for human nurturing of their young.

So, robots, and Kismet was a good start, could certainly be made to hijack that same pathway and perhaps others. It is not how cute they look, nor how similar they look to a human, Kismet is very clearly non-human, it is how easy it is to map their behaviors to ones for which us humans are primed.

Now here is a wacky thought. Over the last few years we have learned how many species of bacteria we carry in out gut (our micro biome), on our skin, and in our mouths. Recent studies suggest all sorts of effects of just what bacterial species we have and how that influences and is influenced by sexual attraction and even non-sexual social compatibility. And there is evidence of transfer of bacterial species between people. What if part of our attraction to dogs is related to or moderated by transfer of bacteria between us and them? We do not yet know if it is the case. But if it is that may doom our social relationships with robots from ever becoming as strong as with dogs. Or people. At least, that is, until we start producing biological replicants as our robots, and by then we will have plenty of other moral pickles to deal with.

With that, we move to the next installment of our quest to build Super Intelligence, Part IV, things to work on now.



“A Heuristic Program that Solves Symbolic Integration Problems in Freshman Calculus”, James R. Slagle, in Computers and Thought, Edward A. Feigenbaum and Julian Feldman, McGraw-Hill , New York, NY, 1963, 191–206, adapted from his 1961 PhD thesis in mathematics at MIT.

2 “Commonsense Reasoning and Commonsense Knowledge in Artificial Intelligence”, Ernest Davis and Gary Marcus, Communications of the ACM, (58)9, September 2015, 92–103.

3 “The Low-Cost Evolution of AI in Domestic Floor Cleaning Robots”, Alexander Kleiner, AI Magazine, Summer 2018, 89–90.

4 See Dina Katabi’s recent TED talk from 2018.

5 “Oxytocin-gaze positive loop and the coevolution of human-dog bonds”, Miho Nagasawa, Shouhei Mitsui, Shiori En, Nobuyo Ohtani, Mitsuaki Ohta,Yasuo Sakuma, Tatsushi Onaka, Kazutaka Mogi, and Takefumi Kikusui, Science, volume 343, 17th April, 2015, 333–336.

[FoR&AI] Steps Toward Super Intelligence II, Beyond the Turing Test

rodneybrooks.com/forai-steps-toward-super-intelligence-ii-beyond-the-turing-test/

[This is the second part of a four part essay–here is Part I.]

As we (eventually…) start to plot out how to build Artificial General Intelligence there is going to be a bifurcation in the path ahead.  Some will say that we must choose which direction to take. I argue that the question is complex and it may be felicitous to make a Shrödingerian compromise.

I get people making two arguments to me about analogies between AGI and heavier than air flight.  And the arguments are made to me with about equal frequency. As I write these sentences I can recall at least one of the two arguments being made to me in each of the last six weeks.

One argument is that we should not need to take into account how humans come to be intelligent, nor try to emulate them, as heavier than air flight does not emulate birds. That is only partially true as there were multiple influences on the Wright brothers from bird flight1. Certainly today the appearance of heavier than air flight is very different from that of birds or insects, though the continued study of the flight of those creatures continues to inform airplane design. This is why over the last twenty years or so jet aircraft have sprouted winglets at the ends of primary wings.

Airplanes can fly us faster and further than something that more resembled birds would. On the other hand our airplanes have not solved the problem of personal flight. We can no more fly up from the ground and perch in a tall tree than we could before the Wright brothers. And we are not able to take off and land wherever we want without large and extremely noisy machines. A little more bird would not be all bad.

I accept the point that to build a human level intelligence it may well not need to be much at all like humans in how it achieves that. However, for now at least, it is the only model we have and there is most likely a lot to learn still from studying how it is that people are intelligent. Furthermore, as we will see below, having a lot of commonality between humans and intelligent agents will let them be much more understandable partners.

This is the compromise that I am willing to make. I am willing to believe that we do not need to do everything like humans do, but I am also convinced that we can learn a lot from humans and human intelligence.

The second argument that I get is largely trying to cast me as a grumpy old guy, and that may well be fair!

People like to point out that very late in the nineteenth century, less than a decade before the Wright brothers’ first flight, Lord Kelvin and many others had declared that heavier than air flight was impossible. Kelvin, and others, could not have meant that literally as they knew that birds and insects were both heavier than air and could fly. So they were clearly referring to something more specific. If they were talking about human powered heavier than air flight, they were sort of right as it was more than sixty years away2, and then only for super trained athletes.

More likely, they were discouraged by the failures of so many attempts with what had seemed like enough aerodynamic performance and engine power to fly. What they were missing understanding was that it was the control of flight which needed to be developed, and that is what made the Wright brothers successful.

So… are we, by analogy, just one change in perspective away from getting to AGI? And really, is deep learning that change in perspective?  And I don’t know that it is already solved, and the train is accelerating down the track?

I, old(ish) guy for sure, grumpy or not, think that the analogy breaks down. I think it is not just an analogy for flight of “control” that we are missing, but that in fact there are at least a hundred such things we are currently missing. Intelligence is a much bigger deal than flight. A bird brain does not cut it.

So now, with the caveat of the second argument perhaps identifying a personal failing, I want to proclaim that there is no need to come down on one side or the other, airplane vs bird, doing it a completely different way AI vs emulating how a human does it.

Instead I think we should use the strengths wherever we can find them, in engineering, or in biology. BUT, if we do not make our AI systems understandable to us, if they can not relate to the quirks that make us human, and if they are so foreign that we can not treat them with empathy, then they will not be adopted into our lives.

Underneath the shiny exterior they may be not be flesh and blood, but instead may be silicon chips and wires, and yes, even the occasional deep learning network, but if they are to be successful in our world we are going to have to like them.

TWO INSTANCES OF AGI, AS AGENTS THAT DO REAL WORK

Just what a AGI agent or robot must be able to do to qualify as being generally intelligent is I think a murky question. And my goal of endowing them with human level intelligence is even murkier to define. To try to give it a little coherence I am going to choose two specific applications for AGI agents, and then discuss what this means in terms of research that awaits in order to achieve that goal. I could just as well have chosen different applications for them, but these two will make things concrete. And since we are talking about Artificial GENERAL Intelligence, I will push these application agents to be as good as one could expect a person to be in similar circumstances.

The first AGI agent will be a physically embodied robot that works in a person’s home providing them care as they age. I am not talking about companionship but rather physical assistance that will enable someone to live with dignity and independence as they age in place in their own home. For brevity’s sake we will refer to this robot, an eldercare worker, as ECW for the rest of this post.

ECW may come with lots of pre-knowledge about the general task of helping an elderly person in their home, and a lot of fundamental knowledge of what a home is, the sorts of things that will be found there, and all sorts of knowledge about people, both the elderly in general, and a good range of what to expect of family members and family dynamics, along with all sorts of knowledge about the sorts of people that might come into the home be it to deliver goods, or to carry out maintenance on the home. But ECW will also need to quickly adapt to the peculiarities of the particular elderly person and their extended social universe, the particular details of the house, and be able to adapt over time as the person ages.

The second AGI agent we will consider need not be embodied for the purposes of this discussion.  It will be an agent that can be assigned a complex planning task that will involve a workplace, humans as workers, machines that have critical roles, and humans as customers to get services at this workplace. For instance, and this is the example we will work through here, it might one particular day be asked to plan out all the details of a dialysis ward, specifying the physical layout, the machines that will be needed, the skillsets and the work flow for the people or robots to be employed there, and to consider how to make the experience for patients one that they will rank highly. We will refer to this services logistics planner as SLP.

Again SLP may come to this task with lots of prior knowledge about all sorts of things in the world, but there will be peculiarities about the particular hospital building, its geographical location and connection to transportation networks, the insurance profiles of the expected patient cohort, and dozens of other details. Although there have been many dialysis wards already designed throughout the world, for the purpose of this blog post we are going to assume that SLP has to design the very first one. Thus a dialysis ward, as used here, is a proxy for some task entirely new to humanity. This sort of logistical planning is not at all uncommon, and senior military officers, below the flag level, can assume that they will be handed assignments of this magnitude in times of unexpected conflicts. If we are going to be able to build an AGI agent that can work at human level, we surely should expect it to be able to handle this task.

We will see that both these vocations are quite complex and require much subtlety in order to be successful. I believe that the same is true of most human occupations, so I could equally well have chosen other AGI based agents, and the arguments below would be very similar.

My goal here is to replace the so-called “Turing Test” with something which test the intelligence of a system much more broadly. Stupid chat bots have managed to do well at the Turing Test. Being an effective ECW or SLP will be a much more solid test, and will guarantee a more general competence than what is required for the Turing Test.

Some who read this might argue that all the detailed problems and areas crying out for research that I give below will be irrelevant for a “Super Intelligence”, as it will do things its own way and that it will be beyond our understanding.  But such an argument would be falling into one of the traps that I identified in my blog post about seven common forms of mistake that are made in predicting the future of Artificial Intelligence. In this case it is the mistake of attributing magic to a sufficiently advanced technology. Super Intelligence is so super that we can not begin to understand it so there is no point making any rational arguments about it. Well, I am not prepared to sit back and idly wait for magic to appear. I prefer to be actively working to make magic happen, and that is what this post is about.

Oh, and one last meta-point on this rhetorical device that I have chosen to make my arguments. Yes, both EW and SLP will have plenty of opportunity to kill people should they want. I made sure of that so that the insatiable cravings of the Super Intelligence alarmists will have a place for expression concerning the wickedness that is surely on its way.

What Does ECW (ELDER CARE WORKER) Need To Do?

Let us suppose that the Elder Care Worker (ECW) robot is permanently assigned to work in a house with an older man, named Rodney. We will talk about the interactions between ECW and Rodney, and how Rodney may change over time as he gets older.

There are some basic things that almost go without saying. ECW should not harm Rodney, or guests at his house. It should protect Rodney from all things bad, should do what Rodney asks it to do, unless in some circumstances it should not (see below), etc.

One might think that Asimov’s three laws will cover this base necessity. In a 1942 short story titled “Runaround” the science fiction writer Isaac Asimov stated them as:

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Then in subsequent stories the intricate details of how these rules might be in conflict, or have inherent ambiguity in how they should be interpreted, became a vehicle for new story after new story for Asimov.

It turns out that no robot today has a remote chance of successfully obeying these rules, for reasons of the necessary perception being difficult, using common sense, and predicting how people are going to behave being beyond current AI capabilities. We will talk at more length about these challenges in Part III of this essay.

But somehow ECW should have a code of conduct concerning the sorts of things it should do and should not do. That might be explicitly represented and available for introspection by ECW, or it may be built in due to inherent limitations in the way ECW is built. For instance any dishwashing machine that you can buy today just inherently can not wander into the living room during a wash cycle and pour out its water on the floor!

But ECW does need to be a lot more proactive than a dishwasher. To do so it must know who Rodney is, have access to relevant medical data about Rodney and be able to use that information in all sorts of decisions it must make on a daily basis. At the very least it must be able to recognize Rodney when he comes home, and track his appearance as he ages. ECW needs to know who is Rodney, who are family members, and needs to track who is in the house and when. It also needs to know the difference between a family member and a non family member, and individually know each of those family members. It needs to know who is a friend, who is just an acquaintance, who is a one time visitor, who is a regular visitor but a service provider of some sort, when the person at the door is a delivery person, and when the person at the door is someone who should not be allowed in.

As I pointed out in my post titled What Is It Like to Be a Robot? our domestic robots in the coming decades will be able to have a much richer set of sensory inputs than do we humans. They will be able to detect the presence of humans in all sorts of ways that we can not, but they will still need to be very good at quickly recognizing who is who, and it will most likely need to be done visually. But it won’t do to demand a full frontal face on view. Instead they will, like a person, need to piece together little pieces of information quickly, and make inferences. For example, if some person leaves a room to go to another room full of people, and a minute later a person comes from that room, with similar clothes on, perhaps that will be enough of a clue. Without knowing who is who the ECW could get quite annoying. And it should not be annoying. Especially not to Rodney, who will have to live with the robot for ten or twenty years.

Here is an example of an annoying machine. I have had an account with a particular bank for over thirty years and I have used its ATM card to get cash for that whole time. Every single time I insert my card it asks me whether I won’t so communicate in Spanish today. I have never said yes. I will never say yes. But it asks me anew every single interaction.

ECW should not be that sort of annoying with any of the people that it interacts with. It will need to understand what sorts of things are constant about a person, what sorts of things change often about a person, and what sort of changes in circumstances might precipitate a previously unlikely way of interacting. On the other hand, if a known service person shows up after months of absence, and ECW did not summon that person, it would probably be reasonable of ECW to ask why they are there. A human care giver knows all these sorts of social interaction things about people. ECW will need to also, or otherwise it will fall into the same class as an annoying ATM.

ECW will need to model family dynamics in order to not be annoying and to not make any sort of social faux pas. As Rodney deteriorates it may need to step in to the social dynamic to smooth things out, as a good human caretaker would. ECW might need to whisper to Rodney’s daughter when she arrives some day that Rodney seems a little cranky today, and then explain why–perhaps a bad reaction to some regular medicine, or perhaps he has been raving about someone claiming that one of his 25 year old blog predictions about future technology was overly pessimistic and he doesn’t buy it, but sure is upset about it. Other details may not matter at all. ECW will need to give the appropriate amount of information to the appropriate person.

In order to do this ECW will need to have a model of who knows what, who should know what, how they might already have the right information, or how their information may be wrong or out of date.

Rodney will likely change over ECW’s tenure, getting frailer, and needing more help from ECW. ECW will need to adjust its services to match Rodney’s changing state. That may include changing who it listens to primarily for instructions.

While the adults around will stay adults Rodney’s grandchild may change from being a helpless baby to a being a college graduate or even a medical doctor. If ECW is going to be as seamless as a person would be in accommodating those changes in its relationship with the grandchild over time then it is going to need to understand a lot about children and their levels of competence and how it should change its interaction. A college graduate is not going to appreciate being interacted with as though a baby.

But what does ECW need to actually do?

As Rodney declines over time, ECW will need to take over more and more responsibility for the normal aspects of living. It will need to clean up the house, picking up things and putting them away. It will need to distinguish things that are on the floor, understanding that disposing of a used tissue is different from dealing with a sock found on the floor.

It will need to start reaching for the things that are stored high up in the kitchen and hand them to Rodney. It may need to start cooking meals and be able to judge what Rodney should be eating to stay healthy.

When a young child shows something to an adult they know to use motion cues to draw the adult’s attention to the right thing. They know to put it in a place where the adult’s line of sight will naturally intersect. They know to glance at the adult’s eyes to see where their attention lies, and they know to use words to draw the adult’s attention when the person has not yet focused on the object being shown.

In order to do this well, even with all its super senses, ECW will still need to be able to “imagine” how the human sees the world so that it can ascertain what cues will most help the human understand what it is trying to convey.

When ECW first starts to support Rodney in his home Rodney may well be able to speak as clearly to ECW as he does to an Amazon Echo or a Google Home, but over time his utterances will become less well organized. ECW will need to use all sorts of context and something akin to reasoning in order to make sense of what Rodney is trying to convey. It likely will be very Rodney-dependent and likely not something that can be learned from a general purpose large data set gathered across many elderly people.

As Rodney starts to have trouble with noun retrieval ECW will need to follow the convoluted ways that Rodney tries to get around missing words when he is trying to convey information to ECW. Even a phrase as simple sounding as “the red thing over there” may be complex for ECW to understand. Current Deep Learning vision systems are terrible at color constancy (we will talk about that in the what is hard section next). Color constancy is something that is critical to human based ontologies of the world, the formal naming systems any group that shares a spoken language assumes that all other speakers will understand. It turns out that “red” is actually quite complex in varied lighting conditions–humans handle it just fine, but it is one of many tens of sub-capabilties that we all have without even realizing it.

ECW will have to take turns with Rodney on some tasks, encouraging him to do things for himself as he gets older–it will be important for Rodney to remain active, and ECW will have to judge when pushing Rodney to do things is therapeutic  and when it is unsupportive.

Eventually ECW will need to help Rodney doing all the things that happen in bathrooms. At some point it will need to start going into the bathroom with him, observing whether he is unsteady or not and provide verbal coaching. Over time ECW will need to stick closer to Rodney in the bathroom providing physical support as needed. Eventually it may need to help him get on to and off of the oval office, perhaps eventually providing wiping assistance. Even before this, ECW may need to help Rodney get into and out of bed–once a person loses the ability to do that on their own they often need to leave their home and go into managed care. ECW will be able to stave off that day, perhaps for years, just by helping with this twice daily task.

Coming into contact with, and supplying support and even lifting a frail and unsteady human body, easily damaged, and under control of a perhaps not quite rational degraded human intelligence is a complex physical interaction. It can often be eased by verbal communication between the two participants. Over the years straightforward language communication will get more and more difficult in both directions, as split second decisions will need to be made just at the physical level alone on what motions are appropriate, effective, and non-threatening.

As ECW comes into contact with Rodney there will be more opportunities for diagnostics. A human caregiver helping Rodney out of bed would certainly notice if his pajamas were wet. Or if some worse accident had happened. Now we get to an ethical issue. Suppose Rodney notices that ECW noticed, and says, “please don’t tell my children/doctor”. Under what conditions should ECW honor Rodney’s request, and when will it be in Rodney’s better interest to disobey Rodney and violate his privacy?

Early versions of ECW, before such robots are truly intelligent, will most likely rely on a baked in set of rules which require only straightforward perceptual inputs in order make these decisions–they will appear rigid and sometimes inhuman. When we get ECW to the level of intelligence of a person we will all expect it to make such decisions in much more nuanced ways, and be able to explain how it weighed the competing factors arguing for the two possible outcomes–tell Rodney’s children or not.

All along ECW will need to manage its own resources, including its battery charge, its maintenance schedule, its software upgrades, its network access, its cloud accounts, and perhaps figuring out how to pay its support bills if Rodney’s finances are in disarray. There will be a plethora of ECW’s other cyber physical needs that will need to be met while working around Rodney’s own schedule. ECW will have to worry about its own ongoing existence and health and weigh that against its primary mission of providing support to Rodney.

What Does SLP (SERVICES LOGISTICS PLANNER) Need To Do?

The Services Logistics Planner (SLP) does not need to be embodied, but it will need to at least appear to be cerebral, even as it lives entirely in the cloud. It will need to be well grounded in what it is to be human, and what the human world is really like if it is to be able to do its task.

The clients, the people who want the facility planned, will communicate with SLP through speech as we all do with the Amazon Echo or Google Home, and through sending it documents (Word, Powerpoint, Excel, pdf’s, scanned images, movie files, etc.). SLP will reply with build out specifications, organizational instructions, lists of equipment that will be needed and where it will be placed, staffing needs, consumable analysis, and analysis of expected human behaviors within the planned facility. And then the client and other interested parties will engage in an interactive dialog with SLP,  asking for the rationale for various features, suggesting changes and modifying their requirements. If SLP is as good as an intelligent person then this dialog will appear natural and have give and take3.

Consider the task of designing a new dialysis ward for an existing hospital.

Most likely SLP will be given a fixed space within the floor plan, and that will determine some aspects of people flow in and out, utility conduits, etc.  SLP will need to design any layout changes of walls within the space, and submit petitions to the right hospital organizations to change entrances and exits and how they will affect people flow in other parts of the hospital. SLP will need to decide what equipment will be where in the newly laid out space, what staffing requirements there will be, what hours the ward should be accepting out patients, what flows there should be for patients who end up having problems during a visit and need to get transferred to other wards in the hospital, and the layout of the beds and chairs for dialysis (and even which should be used). SLP will need to decide, perhaps through consulting other policy documents, how many non-patients will be allowed to accompany a patient, where they will be allowed to sit while the patient is on dialysis, and what the layout of the waiting room will be.

SLP will need to consider boredom for people in the waiting areas. But it won’t be told this in the specification. It will have to know enough about people to consider the different sorts of boredom for patients and supporting friends and relatives, access to snacks (appropriate for support visitors and for pre and post dialysis patients), access to information on when someone’s turn will come, and when someone will be getting out of active dialysis, etc., etc. At some level this will require an understanding of humans, their fears, their expected range of emotions, and how people will react to stressful situations.

It will have to worry about the width of corridors, the way doors open, and the steps or slopes within the facility. It will need to consider these things from many different angles; how to get new equipment in and out, how the cleaning staff will work, what the access routes will be in emergencies for doctors and nurses, how patients will move around, how visitors will be isolated to appropriate areas without them feeling like they are locked in, etc., etc.

SLP will need to worry about break rooms, the design of check in facilities, the privacy that patients can and should expect, and how to make it so that the facility itself is calming, rather than a source of stress, for the staff, the patients, and the visitors.

Outside of the ward itself, SLP will need to look in to what the profile of the expected patients are for this ward, and how they will travel to and from the hospital, for this particular city, for their out patient visit. It will need to predict the likelihood of really bad travel days where multiple patients get unexpectedly delayed. It will need to determine both a policy that tries to push everyone to be on time, but also have a back up system in place as it will not be acceptable to tell such patients that it is just too bad that they missed this appointment through no fault of their own–this is a matter of life and death, and SLP will need to analyze what are the acceptable back up delays and risks to patients.

SLP will not be told any of these things when given the task to design the new dialysis ward. It will need to be smart enough about people to know that these will all be important issues. And it will need to be able to push back on the really important aspects during the “value engineering” (i.e., reduction in program) that the humans reviewing the task will be promoting. And remember that for the purposes of this demonstration of human level general intelligence we are going to assume that this is the very first dialysis ward ever designed.

There is a lot of knowledge that SLP will need to access. And a lot of things that will need to feed into every decision, and a lot of tradeoffs that will need to be made, as surely there will need to be many compromises to be made as in any human endeavor.

Most importantly SLP will need to be able to explain itself. An insurance company that is bidding on providing insurance services for the facility that SLP has just designed will want to ask specific questions about what considerations went into certain aspects of the design. Representatives from the company will push for models of all sorts of aspects of the proposed design in terms of throughput considerations, risks to people quite apart from their illness and their dialysis requirements, how it was determined that all aspects of the design meet fire code, what sort of safety certifications exist for the chosen materials, etc., etc., etc.

But “Wait!” the Deep Learning fan boy says. “It does not need to do all those humanny sorts of things. We’ll just show it thousands of examples of facilities throughout the world and it will learn! Then it will design a perfect system. It is not up to us mere humans to question the magic skill of the Super Intelligent AI.”

But that is precisely the point of this example. For humans to be happy with a system designed by SLP it must get details incredibly correct. And by making this the first dialysis ward ever designed  it means that there just will not be much in the way of data on which to train it. If something really is Super it will have to handle tasks à nouveau, since people have been doing that throughout history.

The Two New Tests

I said about that I was proposing these two test cases, ECW and SLP, as a replacement for the Turing Test. Some may be disappointed that I did not give a simple metric on them.

In other tests there is a metric. In the Turing Test it is what percentage of human judges it fools into making a wrong binary decision. In the ever popular robot soccer competitions it is which team wins. In the DARPA Grand Challenge it was how long it took an autonomous vehicle to finish the course.

ECW and SLP have much more nuanced tasks. Just as there is no single multiple choice test that one must pass to receive a PhD, we will only know if ECW or SLP are doing a good job by continuously challenging them and evaluating them over many years.

Welcome to the real world. That is where we want our Artificial General Intelligences and our Super Intelligences to live and work.

Next up: Part III, some things that are hard for AI today.



1 The Wright Brothers were very much inspired by the gliding experiments of Otto Lilienthal. He eventually died in an accident while gliding in 1896 after completing more than 2,000 flights in the previous five years in gliders of his own design. Lilienthal definitely studied and was inspired by birds. He had published a book in 1889 titled Der Vogelflug als Grundlage der Fliegekunst, which translates to Birdflight as the Basis of Aviation. According to Wikipedia. James Tobin, on page 70 of his 2004 book To Conquer The Air: The Wright Brothers and the Great Race for Flight, says something to the effect of “[o]n the basis of observation, Wilbur concluded that birds changed the angle of the ends of their wings to make their bodies roll right or left”.

2 The first human powered heavier than air flight had to wait until 1961 in Southhampton, UK, and it was not until 1977 that the human powered heavier than air Gossamer Condor flew for a mile in a figure eight, showing both duration and controllability. In 1979 the Gossamer Albatross flew 22 miles across the English Channel, powered by Bryan Allen flying at an average height of 5 feet. And in 1988 MIT’s Daedalus powered by Kanellos Kannellopoulous flew a still record 72 miles from Crete to Santorini in a replay of ancient Greek mythology.

3 Humans are quite used to having conversations involving give and take where the two participants are not equals–parent and three year old, teacher and student, etc. Each can respect the other and be open to insights from the “lower ranked” individual while being respectful and each side letting the other feel like they belong in the conversation. Even when we get to Super Intelligence there is no reason to think we will immediately go to no respect for the humans. And if our earlier not quite Super systems start to get that way we will change them.

[FoR&AI] Steps Toward Super Intelligence I, How We Got Here

rodneybrooks.com/forai-steps-toward-super-intelligence-i-how-we-got-here/

God created man in his own image.

Man created AI in his own image.


Once again, with footnotes.

God created man in his1 own image.2

Man3 created AI in his3 own image.


At least that is how it started out. But figuring out what our selves are, as machines, is a really difficult task.

We may be stuck in some weird Gödel-like incompleteness world–perhaps we are creatures below some threshold of intelligence which stops us from ever understanding or building an artificial intelligence at our  level. I think most people would agree that that is true of all non-humans on planet Earth–we would be extraordinarily surprised to see a robot dolphin emerge from the ocean, one that had been completely designed and constructed by living dolphins. Like dolphins, and gorillas, and bonobos, we humans may be below the threshold as I discussed under “Hubris and Humility”.

Or, perhaps, as I tend to believe, it is just really hard, and will take a hundred years, or more, of concerted effort to get there. We still make new discoveries in chemistry, and people have tried to understand that, and turn it from a science into engineering, for thousands of years. Human level intelligence may be just as, or even more, challenging.

In my most recent blog post on the origins of Artificial Intelligence, I talked about how all the founders of AI, and most researchers since have really been motivated by producing human level intelligence. Recently a few different and small groups have tried to differentiate themselves by claiming that they alone (each of the different groups…) are interested in producing human level intelligence, and beyond, and have each adopted the name Artificial General Intelligence (AGI) to distinguish themselves from mainstream AI research. But mostly they talk about how great it is going to be, and how terrible it is going to be, and very often both messages at the same time coming out of just one mouth, once AGI is achieved. And really none of these groups have any idea how to get there. They are too much in love with the tingly feelings thinking about it to waste time actually doing it.

Others, who don’t necessarily claim to be AI researchers but merely claim to know all about AI and how it is proceeding (ahem…), talk about the exquisite dangers of what they call Super Intelligence, AI that is beyond human level. They claim it is coming any minute now, especially since as soon as we get AGI, or human level intelligence, it will be able to take over from us, using vast cloud computational resources, and accelerate  the development of AI even further. Thus there will be a runaway development of Super Intelligence. Under the logic of these hype-notists this super intelligence will naturally be way beyond our capabilities, though I do not know whether they believe it will supersede God…  In any case, they claim it will be dangerous, of course, and won’t care about us (humans) and our way of life, and will likely destroy us all. I guess they think a Super Intelligence is some sort of immigrant. But these heralds who have volunteered their clairvoyant services to us also have no idea about how AGI or Super Intelligence will be built. They just know that it is going to happen soon, if not already. And they do know, with all their hearts, that it is going to be bad. Really bad.

It does not help the lay perception at all that some companies claiming to have systems based on Artificial Intelligence are often using humans to handle the hard cases for their online systems, unbeknownst to the users. This can seriously confuse public perception of just where today’s Artificial Intelligence stands, not to mention the inherent lack of privacy since I think we humans are somehow more willing sometimes to share private thoughts and deeds with our machines than we are with other people.

In the interest of getting to the bottom of all this, I have been thinking about what research we need to do, what problems we need to solve, and how close we are to solving all of them in order to get to Artificial General Intelligence entities, or human intelligence level entities. We have been actively trying for 62 years, but apparently it is only just right now that we are about to make all the breakthroughs that will be necessary. That is what this blog is about, giving my best guess at all the things we still don’t know, that will be necessary to know for us to build AGI agents, and then how they will take us on to Super Intelligence. And thus the title of this post: Steps Toward Super Intelligence.

And yes, this title is an homage to Marvin Minsky’s Steps Toward Artificial Intelligence from 1961. I briefly reviewed that paper back in 1991 in my paper Intelligence Without Reason, where I pointed out the five main areas he identifies for research into Artificial Intelligence were search, three ways to control search (for pattern-recognition, learning, and planning), and a fifth topic, induction. Pattern recognition and learning have today clearly moved beyond search. Perhaps my prescriptions for research towards Super Intelligence will also turn out to be wrong before very long. But I am pretty confident that the things that will date my predictions are not yet known by most researchers, and certainly are not the hot topics of today.

This started out as a single long essay, but it got longer and longer. So I split it into four parts, but they are also all long. In any case, it is the penultimate essay in my series on the Future of Robotics and Artificial Intelligence.

BUT SURELY WE (YES, US HUMANS!) CAN DO IT SOON!

Earlier I said this endeavor may take a hundred years? For techno enthusiasts, of which I count myself as one, that sounds like a long time. Really, is it going to take us that long? Well, perhaps not, perhaps it is really going to take us two hundred, or five hundred. Or more.

Einstein predicted gravitational waves in 1916. It took ninety nine years of people looking before we first saw them in 2015. Rainer Weiss, who won the Nobel prize for it, sketched out the successful method after fifty one years in 1967. And by then the key technologies needed, laser and computers, were in wide spread commercial use. It just took a long time.

Controlled nuclear fusion has been forty years away for well over sixty years now.

Chemistry took millennia, despite the economic incentive of turning lead into gold (and it turns out we still can’t do that in any meaningful way).

P=NP? has been around in its current form for forty seven years and its solution would guarantee whoever did it to be feted as the greatest computer scientist in a generation, at least. No one in theoretical computer science is willing to guess when we might figure that one out. And it doesn’t require any engineering or production. Just thinking.

Some things just take a long time, and require lots of new technology, lots of time for ideas to ferment, and lots of Einstein and Weiss level contributors along the way.

I suspect that human level AI falls into this class. But that it is much more complex than detecting gravity waves, controlled fusion, or even chemistry, and that it will take hundreds of years.

Being filled with hubris about how tech (and the Valley) can do whatever they put their mind to may just not be nearly enough.

Four Previous Attempts at General AI

Referring again to my blog post of April on the origins of Artificial Intelligence people have been actively working on a subject, explicitly called “Artificial Intelligence” since the summer of 1956. There were precursor efforts for the previous twenty years, but that name had not yet been invented or assigned–and once again I point to my 1991 paper Intelligence Without Reason for a history of the prior work and the first 35 years of Artificial Intelligence.

I count at least four major approaches to Artificial Intelligence over the last sixty two years. There may well be others that some would want to include.

As I see it, the four main approaches have been, along with approximate start dates:

  1. Symbolic (1956)
  2. Neural networks (1954, 1960, 1969, 1986, 2006, …)
  3. Traditional robotics (1968)
  4. Behavior-based robotics (1985)

Before explaining the strengths and weakness of these four main approaches I will justify the dates that I given above.

For Symbolic I am using the 1956 date of the famous Dartmouth workshop on Artificial Intelligence

Neural networks have been investigated, abandoned, and taken up again and again. Marvin Minsky submitted his Ph.D. thesis in Princeton in 1954, titled Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem; two years later Minsky had abandoned this approach and was a leader in the symbolic approach at Dartmouth. Dead. In 1960 Frank Rosenblatt published results from his hardware Mark I Perceptron, a simple model of a single neuron, and tried to formalize what it was learning. In 1969 Marvin Minsky and Seymour Papert published a book, Perceptrons, analyzing what a single perceptron could and could not learn. This effectively killed the field for many years. Dead, again. After years of preliminary work by many different researchers, in 1986 David Rumelhart, Geoffrey Hinton, and Ronald Williams published a paper Learning Representations by Back-Propagating Errors, which re-established the field using a small number of layers of neuron models, each much like the Perceptron model. There was a great flurry of activity for the next decade until most researchers once again abandoned neural networks. Dead, again. Researchers here and there continued to work on neural networks, experimenting with more and more layers, and coining the term deep for those many more layers. They were unwieldy and hard to make learn well, and then in 2006 Geoffrey Hinton (again!) and Ruslan Salakhutdinov, published Reducing the Dimensionality of Data with Neural Networks, where an idea called clamping allowed the layers to be trained incrementally. This made neural networks undead once again, and in the last handful of years this deep learning approach has exploded into practicality of machine learning. Many people today know Artificial Intelligence only from this one technical innovation.

I trace Traditional Robotics, as an approach to Artificial Intelligence, to the work of Donald Pieper, The Kinematics of Manipulators Under Computer Control, at the Stanford Artificial Intelligence Laboratory (SAIL) in 1968.  In 1977 I joined what had by then become the “Hand-Eye” group at SAIL, working on the “eye” part of the problem for my PhD.

As for Behavior-based robotics, I track this to my own paper, A Robust Layered Control System for a Mobile Robot, which was written in 1985, but appeared in a journal in 19864, when it was called the Subsumption Architecture. This later became the behavior-based approach, and eventually through technical innovations by others morphed into behavior trees. I am perhaps lacking a little humility in claiming this as one of the four approaches to AI. On the other hand it has lead to more than 20 million robots in people’s homes, numerically more robots by far than any other robots ever built, and behavior trees are now underneath the hood of two thirds of the world’s video games, and many physical robots from UAVs to robots in factories. So it has at least been a commercial success.

Now I attempt to give some cartoon level descriptions of these four approaches to Artificial Intelligence. I know that anyone who really knows Artificial Intelligence will feel that these descriptions are grossly inadequate. And they are. The point here is to give just a flavor for the approaches. These descriptions are not meant to be exhaustive in showing all the sub approaches, nor all the main milestones and realizations that have been made in each approach by thousands of contributors. That would require a book length treatment. And a very thick book at that. These descriptions are meant to give just a flavor.

Now to the four types of AI. Note that for the first two, there has usually been a human involved somewhere in the overall usage pattern. This puts a second intelligent agent into the system and that agent often handles ambiguity and error recovery.  Often, then, these sorts of AI systems have had to deliver much less reliability than autonomous systems will demand in the future.

1. Symbolic Artificial Intelligence

The key concept in this approach is one of symbols. In the straightforward (every approach to anything usually gets more complicated over a period of decades) symbolic approach to Artificial Intelligence a symbol is an atomic item which only has meaning from its relationship to other meanings. To make it easier to understand the symbols are often represented by a string of characters which correspond to a word (in English perhaps), such as  cat or animal. Then knowledge about the world can be encoded in relationships, such as instance of and is.

Usually the whole system would work just as well, and consistently if the words were replaced by, say g0537 and g0028. We will come back to that.

Meanwhile, here is some encoded knowledge:

  • Every instance of a cat is an instance of a mammal.
  • Fluffy is an instance of cat.
  • Now we can conclude that Fluffy is an instance of a mammal.
  • Every instance of a mammal is an instance of an animal.
  • Now we can conclude that every instance of a cat is an instance of an animal.
  • Every instance of an animal can carry out the action of walking.
  • Unless that instance of  animal is in the state of being dead.
  • Every instance of an animal is either in the state of being alive or in the state of being dead — unless the time of now is before the time of (that instance of animal carrying out the action of birth).

While what we see here makes a lot of sense to us, we must remember that as far as an AI program that uses this sort of reasoning is concerned, it might as well have been:

  • Every instance of a g0537 is an instance of a g0083.
  • g2536 is an instance of g0537.
  • Now we can conclude that g2536 is an instance of a g0083.
  • Every instance of a g0083 is an instance of an g0028.
  • Now we can conclude that every instance of a g0537 is an instance of an g0028.
  • Every instance of an g0028 can carry out the action of g0154.
  • Unless that instance of  g0028 is in the state of being g0253.
  • Every instance of an g0028 is either in the state of being g0252 or in the state of being g0253 — unless the value(the-computer-clock) < time of (that instance of g0028 carrying out the action of g0161).

In fact it is worse than this. Above the relationships are still described by English words. For an AI program that uses this sort of reasoning is concerned, it might as well have been.

  • For every x where r0002(xg0537) then r0002(xg0083).
  • r0002(g2536, g0537).
  • Now we can conclude that r0002(g2536g0083).
  • For every x where r0002(xg0083) then r0002(xg0028).
  • Now we can conclude that for every x where r0002(xg0537) then r0002(xg0028).
  • For every x where r0002(xg0028) then r0005(xg0154).
  • Unless r0007(xg0253).
  • For every x where r0002(xg0028) then either r0007(x, g0252) or r0007(x, g2053) — unless the value(the-computer-clock) < p0043(a0027(g0028, g0161)).

Here the relationships like “is an instance of” have been replaced by anonymous symbols like r0002, and the symbol < replaces “before“, etc.  This is what it looks like inside an AI program, but even with this the AI program never looks at the names of the symbols, rather just when one symbol in an inference or statement is the same symbol as in another inference or statement. The names are only there for humans to interpret, so when g0537 and g0083 were cat and mammal, a human looking at the program5 or its input or ouput could put an interpretation on what the symbols might “mean”.

And this is the critical problem with symbolic Artificial Intelligence, how the symbols that it uses are grounded in the real world. This requires some sort of perception of the real world, some way to and from symbols that connects them to things and events in the real world.

For many applications it is the humans using the system that do the grounding. When we type in a query to a search engine it is we who choose the symbols to make our way into what the AI system knows about the world. It does some reasoning and inference, and then produces for us a list of Web pages that it has deduced match what we are looking for (without actually having any idea that we are something that corresponds to the symbol person that it has in its database). Then it is us who looks at the summaries that it has produced of the pages and clicks on the most promising one or two pages, and then we come up with some new or refined symbols for a new search if it was not what we wanted. We, the humans, are the symbol grounders for the AI system. One might argue that all the intelligence is really in our heads, and that really all the AI powered search engine provides us with is a fancy index and a fancy way to use it.

To drive home this point consider the following thought experiment.

Imagine you are a non Korean speaker, and that the AI program you are interacting with has all its input and output in Korean. Those symbols would not be much help. But suppose you had a Korean dictionary, with the definitions of Korean words written in Korean. Fortunately modern Korean has a finite alphabet and spaces between words (though the rules are slightly different from those of English text), so it will be possible to extract “symbols” from looking at the program output. And then you could look them up in the dictionary, and perhaps eventually infer Korean grammar.

Now it is just possible that you could use your extensive understanding of the human world, about which the Korean dictionary must be referring to for many entries, to guess at some of the meanings of the symbols. But if you were a Heptapod from the movie Arrival and it was before (uh-oh…) Heptapods had ever visited Earth then you would not even have this avenue for grounding these entirely alien symbols.

So it really is the knowledge in people’s heads that does the grounding in many situations. In order to get the knowledge into an AI program it needs to be able to relate the symbols to something outside its self consistent Korean dictionary (so to speak). Some hold out hope that our next character in the pantheon of pretenders to the throne of general Artificial Intelligence, neural networks, will play that role. Of course, people have been working on making that so for decades. We’re still a long way off.

To see the richness of sixty plus years of symbolic Artificial Intelligence work I recommend the AI Magazine, the quarterly publication of the Association for the Advancement of Artificial Intelligence. It is behind a paywall, but even without joining the association you can see the tables of contents of all the issues and that will give a flavor of the variety of work that goes on in symbolic AI. And occasionally there will also be an article there about neural networks, and other related types of machine learning.

2.0, 2.1. 2.2, 2.3, 2.4, … Neural networks

These are loosely, very loosely, based on a circa 1948 understanding of neurons in the brain. That is to say they do not bear very much resemblance at all to our current understand of the brain, but that does not stop the press talking about this approach as being inspired by biology. Be that as it may.

Here I am going to talk about just one particular kind of artificial neural network and how it is trained, namely a feed forward layered network under supervised learning. There are many variations, but this example gives an essential flavor.

The key element is an artificial neuron that has n inputs, flowing along which come n numbers, all between zero and one, namely x1, x2, … xn. Each of these is multiplied by a weight, w1, w2, … wn, and the results are summed, as illustrated in this diagram from Wikimedia Commons.

(And yes, that w3 in the diagram should really be wn.) These weights can have any value (though are practically limited to what numbers can be represented in the computer language in which the system is programmed) and are what get changed when the system learns (and recall that learn is a suitcase word as explained in my seven deadly sins post). We will return to this in just a few paragraphs.

The sum can be any sized number, but a second step compresses it down to a number between zero and one again by feeding it though a logistic or sigmoid function, a common one being:

f(x) = \frac{1}{1+e^{-x}}

I.e., the sum gets fed in as the argument to the function and the expression on the right is evaluated to produce a number that is strictly between zero and one, closer and closer to those extremes as the input gets extremely negative or extremely positive. Note that this function preserves the order between possible inputs x fed to it. I.e., if y < z then f(y) < f(z). Furthermore the function is symmetric about an input of zero, and an output of 0.5.

This particular function is very often used as it has the property that it is easy to compute its derivate for any given output value without having to invert the function to find the input value. In particular, if you work through the normal rules for derivatives and use algebraic simplification you can show that

\dfrac{\mathrm{d}}{\mathrm{d}x}f(x) = \frac{e^x}{(1+e^x)^2} = f(x)(1-f(x))

This turns out to be very useful for the ways these artificial neurons started to be used in the 1980’s resurgence. They get linked together in regular larger networks such as the one below, where each large circle corresponds to one of the artificial neurons above. The outputs on the right are usually labelled with symbols, for instance cat, or car. The smaller circles on the left correspond to inputs to the network which come from some data source.

For instance, the source might be an image, there might be many thousand little patches of the image sampled on the left, perhaps with a little bit or processing of the local pixels to pick out local features in the image, such as corners, or bright spots. Once the network has been trained, the output labelled cat should put out a value close to one when there is a cat in the image and close to zero if there is no cat in the image, and the one labelled car should have a value similarly saying whether there is a car in the image. One can think of these numbers as the network saying what probability it assigned to there being a cat, a car, etc. This sort of network thus classifies its input into a finite number of output classes.

But how does the network get trained? Typically one would show it millions of images (yes, millions), for which there was ground truth known about which images contained what objects. When the output lines with their symbols did not get the correct result, the weights on the inputs of the offending output neuron would be adjusted up or down in order to next time produce a better result. The amount to update those weights depends on how much difference a change in weight can make to the output. Knowing the derivative  of where on the sigmoid function the output is coming from is thus critical. During training the proportional amount, or gain, of how much the weights are modified is reduced over time. And a 1980’s invention allowed the detected error at the output to be propagated backward through multiple layers of the network, usually just two or three layers at that time, so that the whole system could learn something from a single bad classification. This technique is known as back propagation.

One immediately sees that the most recent image will have a big impact on the weights, so it is necessary to show the network all the other images again, and gradually decrease how much weights are changed over time. Typically each image is shown to the network thousands, or even hundreds of thousands of times, interspersed amongst millions of other images also each being shown to the network hundreds of thousands of times.

That this sort of training works as well as it does is really a little fantastical. But it does work in many cases. Note that a human designs how many layers there are in the network, for each layer how the connections to the next layer of the network are arranged, and what the the inputs are for the network. And then the network is trained, using a schedule of what to show it when, and how to adjust the gains on the learning over time as chosen by the human designer. And if after lots of training the network has not learned well, the human may adjust the way the network is organized, and try again.

This process has been likened to alchemy, in contrast to the science of chemistry. Today’s alchemists who are good at it can command six or even seven figure salaries.

In the 1980’s when back propagation was first developed and multi-layered networks were first used, it was only practical from both a computational and algorithmic point of view to use two or three layers. Thirty years later the Deep Learning revolution of 2006 included algorithmic improvements, new incremental training techniques, of course lots more computer power, and enormous sets of training data harvested from the fifteen year old World Wide Web. Soon there were practical networks of twelve layers–that is where the word deep comes in–it refers to lots of layers in the network, and certainly not “deep introspection”…

Over the more than a decade since 2006 lots of practical systems have been built.

The biggest practical impact for most people recently, and likely over the next couple of decades is the impact on speech transliteration systems. In the last five years we have moved from speech systems over the phone that felt like “press or say `two’ for frustration”, to continuous speech transliteration of voice messages, and home appliances, starting with the Amazon Echo and Google Home, but now extending to our TV remotes, and more and more appliances built on top of the speech recognition cloud services of the large companies.

Getting the right words that people are saying depends on two capabilities. The first is detecting the phonemes, the sub pieces of words, with very different phonemes for different languages, and then partitioning a stream of those phonemes, some detected in error, into a stream of words in the target language. With out earlier neural networks the feature detectors that were applied to raw sound signals to provide low level clues for phonemes were programs that engineers had built by hand. With Deep Learning, techniques were developed where those earliest features were also learned by listening to massive amounts of speech from different speakers all talking in the target language. This is why today we are starting to think it natural to be able to talk to our machines. Just like Scotty did in Star Trek 4: The Voyage Home.

A new capability was unveiled to the world in a New York Times story on November 17, 2014 where the photo below appeared along with a caption that a Google program had automatically generated: “A group of young people playing a game of Frisbee”.

I think this is when people really started to take notice of Deep Learning. It seemed miraculous, even to AI researchers, and perhaps especially to researchers in symbolic AI, that a program could do this well. But I also think that people confused performance with competence (referring again to my seven deadly sins post). If a person had this level of performance, and could say this about that photo, then one would naturally expect that the person had enough competence in understanding the world, that they could probably answer each of the following questions:

  • what is the shape of a Frisbee?
  • roughly how far can a person throw a Frisbee?
  • can a person eat a Frisbee?
  • roughly how many people play Frisbee at once?
  • can a 3 month old person play Frisbee?
  • is today’s weather suitable for playing Frisbee?

But the Deep Learning neural network that produced the caption above can not answer these questions. It certainly has no idea what a question is, and can only output words, not take them in, but it doesn’t even have any of the knowledge that would be needed to answer these questions buried anywhere inside what it has learned. It has learned a mapping from colored pixels, with a tiny bit of spatial locality, to strings of words. And that is all. Those words only rise up a little beyond the anonymous symbols of traditional AI research, to have a sort of grounding, a grounding in the appearance of nearby pixels. But beyond that those words or symbols have no meanings that can be related to other things in the world.

Note that the medium in which learning happens here is selecting many hundreds of thousands, perhaps millions, of numbers or weights. The way that the network is connected to input data is designed by a human, the layout of the network is designed by a human, the labels, or symbols, for the outputs are selected by a human, and the set of training data has previously been labelled by a human (or thousands of humans) with these same symbols.

3. Traditional Robotics

In the very first decades of Artificial Intelligence, the AI of symbols, researchers sought to ground AI by building robots. Some were mobile robots that could move about and perhaps push things with their bodies, and some were robot arms fixed in place. It was just too hard then to have both, a mobile robot with an articulated arm.

The very earliest attempts at computer vision were then connected to these robots, where the goal was to, first, deduce the geometry of what was in the world, and then to have some simple mapping to symbols, sitting on top of that geometry.

In my post on the origins of AI I showed some examples of how perception was built up by looking for edges in images, and then working through rules on how edges might combine in real life to produce geometric models of what was in the world. I used this example of a complex scene with shadows:

In connecting cameras to computers and looking at the world, the lighting and the things allowed in the field of view often had to be constrained for the computer vision, the symbol grounding, to be successful. Below is a really fuzzy picture of the “copy-demo” at the MIT Artificial Intelligence Laboratory in 1970. Here the vision system looked at a stack of blocks and the robot tried to build a stack that looked the same.

At the same time a team at SRI International in Menlo Park, California, were building the robot Shakey, which operated in a room with large blocks, cubes and wedges, with each side painted in a different matte color, and with careful control of lighting.

 

By 1979 Hans Moravec at the Stanford Artificial Intelligence Lab had an outdoor capable robot, “The Cart”, in the center of the image here (a photograph that I took), which navigated around polyhedral objects, and other clutter. Since it took about 15 minutes to move one meter it did get a little confused by the high contrast moving shadows.

And here is the Freddy II robot at the Department of Artificial Intelligence at Edinburgh University in the mid 1970’s, stacking flat square and round blocks and inserting pegs into them.

These early experiments combined image to symbol mapping, along with extracting three dimensional geometry so that the robot could operate, using symbolic AI planning programs from end to end.

I think it is fair to say that those end to end goals have gotten a little lost over the years.  As the reality of complexities due to uncertainties when real objects are used have been realized, the tasks that AI robotics researchers focus on have been largely driven by a self defined research agenda, with proof of concept demonstrations as the goal.

And I want to be clear here. These AI based robotics systems are not used at all in industry. All the robots you see in factories (except those from my company, Rethink Robotics) are carefully programmed, in detail, to do exactly what they are doing, again, and again, and again. Although the lower levels of modeling robot dynamics and planning trajectories for a robot arm are shared with the AI robotics community, above that level it is very complete and precise scripting. The last forty years of AI research applied to factory robots has had almost no impact in practice.

On the other hand there has been one place from traditional robotics with AI that has had enormous impact. Starting with robots such as The Cart, above, people tried to build maps of the environment so that the system could deliberatively plan a route from one place to another which was both short in time to traverse and which would avoid obstacles or even rough terrain. So they started to build programs that took observations as the robot moved and tried to build up a map. They soon realized that because of uncertainties in how far the robot actually moved, and even more importantly what angle it turned when commanded, it was impossible to put the observations into a simple coordinate system with any certainty, and as the robot moved further and further the inaccuracies relative to the start of the journey just got worse and worse.

In late 1984 both Raja Chatila from Toulouse, and I, newly a professor at MIT, realized that if the robot could recognize when it saw a landmark a second time after wandering around for a while it could work backwards through the chain of observations made in between, and tighten up all their uncertainties. We did not need to see exactly the same scene as before, all we needed was to locate one of the things that we had earlier labeled with a symbol, and being sure that the new thing we saw labelled by the same symbol as in fact the same object in the world. This is now called “loop closing” and we independently published papers with this idea in March 1985 at a robotics conference held in St Louis (IEEE ICRA). But neither of us had very good statistical models, and mine was definitely worse than Raja’s.

By 1991 Hugh Durrant-Whyte and John Leonard, then both at Oxford, had come up with a much better formalization, which they originally called “Simultaneous Map Building and Localisation” (Oxford English spelling), which later turned into “Simultaneous Localisation and Mapping” or SLAM. Over the next fifteen years, hundreds, if not thousands, of researchers refined the early work, enabled by newly low cost and plentiful mobile robots (my company iRobot was supplying those robots as a major business during the 1990’s). With a formalized well defined problem, low cost robots, adequate computation, and researchers working all over the world, competing on performance, there was rapid progress. And before too long the researchers managed to get rid of symbolic descriptions of the world and do it all in geometry with statistical models of uncertainty.

The SLAM algorithms became part of the basis for self-driving cars, and subsystems derived from SLAM are used in all of these systems. Likewise the navigation and data collection from quadcopter drones is powered by SLAM (along with inputs from GPS).

4. Behavior-Based Robotics

By 1985 I had spent a decade working in computer vision, trying to extract symbolic descriptions of the world from images, and in traditional robotics, building planning systems for robots to operate in simulated or actual worlds.

I had become very frustrated.

Over the previous couple of years as I had tried to move from purely simulated demonstrations to getting actual robots to work in the real world, I had become more and more buried in mathematics that was all trying to estimate the uncertainty in what my programs knew about the real world. The programs were trying to measure the drift between the real world, and the perceptions that my robots were making of the world. We knew by this time that perception was difficult, and that neat mapping from perception to certainty was impossible. I was trying to accommodate that uncertainty and push it through my planning programs, using a mixture of traditional robotics and symbolic Artificial Intelligence. The hope was that by knowing how wide the uncertainty was the planners could accommodate all the possibilities in the actual physical world.

I will come back to the implicit underlying philosophical position that I was taking in the last major blog post in this series, to come out later this year.

But then I started to reflect on how well insects were able to navigate in the real world, and how they were doing so with very few neurons (certainly less that the number of artificial neurons in modern Deep Learning networks). In thinking about how this could be I realized that the evolutionary path that had lead to simple creatures probably had not started out by building a symbolic or three dimensional modeling system for the world. Rather it must have begun by very simple connections between perceptions and actions.

In the behavior-based approach that this thinking has lead to, there are many parallel behaviors running all at once, trying to make sense of little slices of perception, and using them to drive simple actions in the world. Often behaviors propose conflicting commands for the robot’s actuators and there has to be a some sort of conflict resolution. But not wanting to get stuck going back to the need for a full model of the world, the conflict resolution mechanism is necessarily heuristic in nature. Just as one might guess, the sort of thing that evolution would produce.

Behavior-based systems work because the demands of physics on a body embedded in the world force the ultimate conflict resolution between behaviors, and the interactions. Furthermore by being embedded in a physical world, as a system moves about it detects new physical constraints, or constraints from other agents in the world. For synthetic characters in video games under the control of behavior trees, the demands of physics are replaced by the demands of the simulated physics needed by the rendering engine, and other agents in the world are either the human player of yet more behavior-based synthetic characters.

Just in the last few weeks there has been a great example of this that has gotten a lot of press. Here is the original story about MIT Professor Sangbae Kim’s Cheetah 3 robot. The press was very taken with the robot blindly climbing stairs, but if you read the story you will see that the point of the research is not to produce a blind robot per se. Computer vision, even 3-D vision, is not completely accurate. So any robot that tries to climb rough terrain using vision, rather than feel, needs to be very slow, careful placing its feet one at a time, as it does not know exactly where the solid support in the world is. In this new work, Kim and his team have built a collection of low level behaviors which sense when things have gone wrong and quickly adapt individual legs. To prove the point, they made the robot completely blind–the performance of their robot will only increase as vision gives some high level direction to where the robot should aim its feet, but even so, having these reactive behaviors at the lowest levels make it much faster and more sure footed.

The behavior-based approach, which leaves the model out in the world rather than inside the agent, has allowed robots to proliferate in number. Unfortunately, I often get attacked by people outside the field, saying in effect, we were promised super intelligent robots and all you have given us is robot vacuum cleaners. Sorry, it is a work in progress. At least I gave you something practical…

Comparing The Four Approaches to AI

In my 1990 paper Elephants Don’t Play Chess, in the first paragraph of the second page I mentioned that the “holy grail” for both classical symbolic AI research, and my own research was “general purpose human level intelligence”–the youngsters today saying that the goal of AGI, or Artificial General Intelligence is a new thing are just plain wrong. All four of the approaches I outlined above have been motivated by eventually getting to human level intelligence, and perhaps beyond.

None of them are yet close by themselves, nor have any combinations of them turned into something that seems close. But each of the four approaches has somewhat unique strengths. And all of them have easily identifiable weaknesses.

The use of symbols in AI research allows one to use them as the currency of composition between different aspects of intelligence, passing the symbols from one reasoning component to another. For neural networks fairly weak symbols appear only as outputs and there is no way to feed them back in, or really to other networks. Traditional robotics trades on geometric relationships and coordinates, which means they are easy to compose, but they are very poor in semantic content. And behavior-based systems are sub-symbolic, although there are ways to have some sorts of proto symbols emerge.

Neural networks have been the most successful approach to getting meaningful symbols out of perceptual inputs. The other approaches either don’t try to do so (traditional robotics) or have not been particularly successful at it.

Hard local coordinate systems, with solid statistical relationships between them have become the evolved modern approach to traditional robotics. Both symbolic AI and behavior based systems are weak in having different parts of the systems relate to common, or even well understood relative, coordinate systems. And neural networks simply suck (yes, suck) at spatial understanding.

Of the four approaches to AI discussed here, only the behavior-based approach makes a commitment to an ongoing existence of the system, the others, especially neural networks are much more transactional in nature. And the behavior-based approach reacts to changes in the world on millisecond timescales, as it is embedded, and “living” in the real world. Or in the case of characters in video games, it is well embedded in the matrix. This ability to be part of the world, and to have agency within it is at some level an artificial sentience. A messy philosophical term to be sure. But I think all people who ever utter the phrase Artificial General Intelligence, or utter, or mutter, Super Intelligence, are expecting some sort of sentience. No matter how far from reality the sentience of behavior-based systems may be, it is the best we have got. By a long shot.

I have attempted to score the four approaches on where they are better and where worse. The scale is one to three with three being a real strength of the approach. Notice that besides the four different strengths I have added a column for how well the approaches deal with ambiguity.

These are very particular capabilities that one or the other of the four approaches does better at. But for general intelligence I think we need to talk about cognition. You can find many definitions of cognition but the all have to do with thinking. And the definitions talk about thinking variously in the context of attention, memory, language understanding, perception, problem solving and others. So my scores are going to be a little subjective, I admit.

If we think about a Super Intelligent AI entity, one might want it to act in the world with some sort of purpose. For symbolic AI and traditional robotics there have been a lot of work on planners, programs that look at the state of the world and try to work out a series of actions that will get the world (and the embedded AI system or robot) into a more desirable state. These planners, largely symbolic, and perhaps with a spatial component for robots, started out relying on full knowledge of the state of the world. In the last couple of decades that work has concentrated on finessing the impossibility of knowing the state of the world in detail. But such planners are quite deliberative in working out what is going to happen ahead of time. By contrast the behavior based approaches started out as purely reactive to how the world was changing. This has made them much more robust in the real world which is why the vast majority of deployed robots in the world are behavior-based. With the twenty year old innovation of behavior trees these systems can appear much more deliberative, though they lack the wholesale capability of dynamically re-planning that symbolic systems have. This table summarizes:

Note that neural nets are neither. There has been a relatively small amount of non-mainstream work of getting neural nets to control very simple robots, mostly in simulation only. The vast majority of work on neural networks has been to get them to classify data in some way or another. They have never been a complete system, and indeed all the recent successes of neural networks have had them embedded as part of symbolic AI systems or behavior-based systems.

To end Part I of our Steps Towards Super Intelligence, let’s go back to our comparison of the four approaches to Artificial Intelligence. Let’s see how well we are really doing (in my opinion) by comparing them to a human child.

Recall the scale here is one to three. I have added a column on the right on how well they do at cognition, and a row on the bottom on how well a human child does in comparison to each of the four AI approaches.  One to three.

Note that under this evaluation a human child scores six hundred points whereas the four AI approaches score a total of eight or nine points each. As usual, I think I may have grossly overestimated the current capabilities of AI systems.

Next up: Part II, beyond the Turing Test.



1 This pronoun is often capitalized in this quote, but in my version of the King James Bible, which was presented to my grandmother in 1908, it is just plain “his” without capitalization. Genesis 1:27.

2 In the dedication of this 1973 PhD thesis at the MIT Artificial Intelligence Lab, to the Maharal of Prague–the creator of the best known Golem, Gerry Sussman points out that the Rabbi had noticed that this line was recursive. That observation has stayed with me since I first read it in 1979, and it inspired my first two lines of this blog post.

3 I am using the male form here only for stylistic purposes to resonate with the first sentence.

4 It appeared in the IEEE Journal of Robotics and Automation, Vol. 2, No. 1, March 1986, pp 14–23. Both reviewers for the paper recommended against publishing it, but the editor, Professor George Bekey of USC, used his discretion to override them and to go ahead and put it into print.

5 I chose this form, g0047, for anonymous symbols as that is exactly the form in which they are generated in the programming language Lisp, which is what most early work in AI was written in, and is still out there being used to do useful work.

Bothersome Bystanders and Self Driving Cars

rodneybrooks.com/bothersome-bystanders-and-self-driving-cars/

A story on how far away self-driving cars are just came out in The Verge.  It is more pessimistic than most on when we will see truly self-driving cars on our existing roads. For those of you who have read my blog posts on the unexpected consequences and the edge cases for self-driving cars or my technology adoption predictions, you will know that I too am pessimistic about when they will actually arrive. So, I tend to agree with this particular story and about the outstanding problems for AI that are pointed out by various people interviewed for the story.

BUT, there is one section that stands out for me.

Drive.AI founder Andrew Ng, a former Baidu executive and one of the industry’s most prominent boosters, argues the problem is less about building a perfect driving system than training bystanders to anticipate self-driving behavior. In other words, we can make roads safe for the cars instead of the other way around. As an example of an unpredictable case, I asked him whether he thought modern systems could handle a pedestrian on a pogo stick, even if they had never seen one before. “I think many AV teams could handle a pogo stick user in pedestrian crosswalk,” Ng told me. “Having said that, bouncing on a pogo stick in the middle of a highway would be really dangerous.” 

“Rather than building AI to solve the pogo stick problem, we should partner with the government to ask people to be lawful and considerate,” he said. “Safety isn’t just about the quality of the AI technology.”

Now I really hope that Andrew didn’t say all this stuff.  Really, I hope that.  So let’s assume someone else actually said this.  Let’s call him Professor Confused, whoever he was, just so we can reference him.

The quoted section above is right after two paragraphs about recent fatal accidents involving self-driving cars (though probably none of them should have been left unattended by the person in the driver’s seat in each case). Of the three accidents, only one involves an external person, the woman pushing a bicycle across the road in Phoenix this last March, killed by an experimental Uber vehicle driving itself.

In the first sentence Professor Confused seems to be saying that he is giving up on the promise of self-driving cars seamlessly slotting into the existing infrastructure. Now he is saying that every person, every “bystander”, is going to be responsible for changing their behavior to accommodate imperfect self-driving systems. And they are all going to have to be trained! I guess that means all of us.

Whoa!!!!

The great promise of self-driving cars has been that they will eliminate traffic deaths. Now Professor Confused is saying that they will eliminate traffic deaths as long as all humans are trained to change their behavior?  What just happened?

If changing everyone’s behavior is on the table then let’s change everyone’s behavior today, right now, and eliminate the annual 35,000 fatalities on US roads, and the 1 million annual fatalities world-wide. Let’s do it today, and save all those lives.

Professor Confused suggests having the government ask people to be lawful. Excellent idea! The government should make it illegal for people to drive drunk, and then ask everyone to obey that law. That will eliminate half the deaths in the US immediately.  Let’s just do that today!

Oh, wait…

I don’t know who the real Professor Confused is that the reporter spoke to. But whoever it is just completely upended the whole rationale for self-driving cars. Now the goal, according to Professor Confused, as reported here, is self-driving cars, right or wrong, über alles (so to speak). And you people who think you know how to currently get around safely on the street better beware, or those self-driving cars are licensed to kill you and it will be your own damn fault.



PS This is why the world’s relative handful of self-driving train systems have elaborate safe guards to make sure that people can never get on to the tracks. Take a look next time you are at an airport and you will see the glass wall and doors that keep you separated from the track at all times when you are on the platform. And the track does not intersect with any pedestrian or other transport route.  The track is routed above and under them all.  We are more likely to geo fence self-driving cars than accept poor safety from them in our human spaces.

PPS Dear Professor Confused, first rule of product management. If you need the government to coerce a change in behavior of all your potential customers in order for them to become your actual customers, then you don’t got no customers for what you are trying to sell. Hmmm. In this case I guess they are not your customers. They are just the potential literal roadkill in the self-satisfaction your actual customers will experience knowing that they have gotten just the latest gee whiz technology all for themselves.