Blog

Predictions Scorecard, 2024 January 01

rodneybrooks.com/predictions-scorecard-2024-january-01/

[You can follow me on social media: @rodneyabrooks.bsky.social]

This is my sixth annual update on how my dated predictions from January 1st, 2018 concerning (1) self driving cars, (2) robotics, AI , and machine learning, and (3) human space travel, have held up. I promised then to review them at the start of the year every year until 2050 (right after my 95th birthday), thirty two years in total. The idea is to hold myself accountable for those predictions. How right or wrong was I?

The acronyms I used for predictions in my original post were as follows.

NET year means it will not happen before that year (No Earlier Than)
BY year means I predict that it will happen by that year.
NIML, Not In My Lifetime, i.e., not before 2050.

As time passes mentioned years I color then as accurate, too pessimistic, or too optimistic.

I only change the text in the fourth column of the prediction tables, to say what actually happened.  This year I have removed most of the old comments from the prediction tables to make them shorter; you can go back to last year’s update to see previous comments.  And I highlight any new dates, as in 20240103 for January 3rd, 2024.

Overview of changes this year

First, remember that a lot of good things happened in the world this year, and here are 66 of them. The world is getting better in terms of global health, clean energy, economic and social justice, and conservation. There is much more to life than LLMs.

There has been a lot of activity in both self driving cars (Cruise and Waymo human assisted deployments) and in AI (the arrival of the indisputable next big thing, ChatGPT and friends).  The human spaceflight endeavor has crawled along and largely stretched out dates that were probably too optimistic in the first place.

Self Driving Cars

There are no self driving cars deployed (despite what companies have tried to project to make it seem it has happened), and arguably the prospects for self driving taxi services being deployed at scale took a beating.

First a reminder of why I made predictions in this field.

Back in 2017 the hubris about the oncoming of self driving cars was at a similar level to the hubris in 2023 about ChatGPT being a step towards AGI (Artificial General Intelligence) being just around the corner.  Here is the same version of a slide that I showed last year:

This was a snapshot of predictions for when level 4 or level 5 self driving cars would be available from various car manufactures (e.g., Tesla or Ford), automobile subsystem providers (e.g., Continental or NVIDIA) and ride service companies (e.g., Uber).

The dates in parentheses are when the prediction on that line was made. The dates in blue are the year that was predicted for delivery. I have highlighted the dates that have already passed in pink. None of them were delivered then or now. 2023 did not appear in anyone’s prediction. Next up, 2024 for Jaguar and Land-Rover (it won’t happen…). The orange arrows are for companies that I noticed retracted their statements or pushed them out further sometime after this snap shot.  But in my original predictions on January 1st, 2018, I was reacting to these predictions, not one of which I thought would come to pass by the predicted dates.  I’m batting 17 out of 17, with only six predictions left.

(Not Really) Deployed Autonomous Ride Services, 2023

In the last year both Waymo and Cruise launched  “driverless” ride services in San Francisco, They had both previously had empty vehicles cruising the streets, and had limited availability for certain people to ride in them, for free, as though they were a ride service. Then during 2023 both companies made them available to people (on a waiting list) who signed up for an app which let you pay for rides, 24 hours per day. I took almost forty rides in Cruise vehicles under these programs. In a series of short blog posts I describe, in reverse order of writing, those experiences, what it was like in my last ride where I thought for a moment I was going to be in a terrible accident, and a little history of self driving technology.

I was by no means the only one watching their safety, and it took some good reporters to uncover some of the problems. The CEO of Cruise pushed back that the pushback against his company really wasn’t fair, and was just “anti-robot bias”. I have spent my whole professional life developing robots and my companies have built more of them than anyone else, but I can assure you that as a driver in San Francisco during the day I was getting pretty frustrated with driverless Cruise and Waymo vehicles doing stupid things that I saw and experienced every day. On my second to last commute home from my robot company office in 2023, on December 20th, an empty Waymo with lights not flashing blocked an entrance to an intersection, and I had to temporarily move into on coming traffic to continue on my way.

But things were worse. There were a number of accidents with people inside Cruise vehicles. It seemed like when a Cruise was in an intersection and a car was heading right for it, the vehicle would stop dead. I, and others, speculated that this was based on the idea that if the self driving car was not moving when there was an accident then it could not be blamed for causing the accident.

On my last ride in a Cruise this almost happened to me, and I did for a moment fear for my life.  See the description of this October 19th 2023 event in this short blog post. And see a video here of the same sort of thing happening in August, where a bad collision did occur and the Cruise occupant ended up in hospital.

An event in early October finally caught up with Cruise. A pedestrian had been hit by another vehicle and was thrown into the path of a Cruise vehicle, which was unable to stop in time before driving over the person. What Cruise did not reveal at the time was that the Cruise vehicle then decided to pull over and drove twenty feet with the person trapped underneath the car and dragged the person along. At last report the person was still in hospital.

There were conflicting reports on whether Cruise initially withheld from investigators the part of a video recording that shows that dragging of the pedestrian. But by late October the California Department of Motor Vehicles suspended Cruise from all driverless operations

GM had bought Cruise for a reported $1 billion in 2016. By November 14th this year, there were rumblings that GM was stepping in, pushing changes in safety, and would reduce its support for Cruise, having given it an additional $8.2 billion since 2017, with $1.9 billion just in 2023.  It also bought out Softbank’s shares in Cruise for $3.4 billion in 2022. On November 16th, Cruise suspended a share buy-back program for Cruise employees, which let them cash out on their stock options. The company said it needed to revalue the shares. That was an ominous sign. By November 19th the CEO (and co-founder) of Cruise, Kyle Vogt, stepped down.

GM did indeed pull back on November 29th.

G.M.’s chief financial officer, Paul Jacobson, said spending at Cruise would fall by “hundreds of millions of dollars” in 2024, and would probably fall further as the company reviewed the division’s operations.

GM also stopped work on a new custom vehicle, without conventional controls, intended for Cruise to use in large scale taxi services.

After earlier layoffs of temporary workers who maintained their fleet, in mid-December Cruise had large scale layoffs. First, “nine key leaders” were fired as a byproduct of ongoing safety reviews. And then 900 of their 3800 employees were laid off.

As of the end of the year 2023, none of Cruise’s 950 “autonomous” vehicles, even when driven by humans, venture anywhere on roads in the United States.

Now let’s backtrack about three weeks. The kicker is that although Cruise had made it sound like their vehicles were completely self driving there had been people at the ready to steer them through difficult situations remotely. They were not operating in the way they presented themselves. The CEO had argued that they were safer than human drivers. But they had human drivers to handle situations their “robot”s could not.

In a NYTimes story about the whole mess on November 3rd, I noticed a detail that I had not previously seen.

Half of Cruise’s 400 cars were in San Francisco when the driverless operations were stopped. Those vehicles were supported by a vast operations staff, with 1.5 workers per vehicle. The workers intervened to assist the company’s vehicles every 2.5 to five miles, according to two people familiar with is operations. In other words, they frequently had to do something to remotely control a car after receiving a cellular signal that it was having problems.

Whoa!  Driverless means that there is no human involved in the actual driving. Here the story says that there is an army of people, 1.5 persons per car, who intercede remotely every 2.5 to 5 miles of travel.  I thought I had been taking Cruise vehicles that were driving themselves.

In fact, everyone I talked to in San Francisco thought that the Cruise and Waymo vehicles were fully autonomous as they were so bad in certain ways. I would routinely see vehicles stopped and blocking traffic for 30 minutes. Or three vehicles together blocking an intersection with no visible progress to untangling themselves. And the SF Fire Department was very frustrated with Cruise vehicles wandering into active fire areas, then stopping with their wheels on a fire hose, refusing to move on.

On November 4th then CEO Kyle Vogt posted a statement:

Cruise CEO here. Some relevant context follows.

Cruise AVs are being remotely assisted (RA) 2-4% of the time on average, in complex urban environments. This is low enough already that there isn’t a huge cost benefit to optimizing much further, especially given how useful it is to have humans review things in certain situations.

The stat quoted by nyt is how frequently the AVs initiate an RA session. Of those, many are resolved by the AV itself before the human even looks at things, since we often have the AV initiate proactively and before it is certain it will need help. Many sessions are quick confirmation requests (it is ok to proceed?) that are resolved in seconds. There are some that take longer and involve guiding the AV through tricky situations. Again, in aggregate this is 2-4% of time in driverless mode.

In terms of staffing, we are intentionally over staffed given our small fleet size in order to handle localized bursts of RA demand. With a larger fleet we expect to handle bursts with a smaller ratio of RA operators to AVs. Lastly, I believe the staffing numbers quoted by nyt include several other functions involved in operating fleets of AVs beyond remote assistance (people who clean, charge, maintain, etc.) which are also something that improve significantly with scale and over time.

Cruise was not doing autonomous driving after all. They were routinely relying on remote human interventions.  But they were doing even that badly judging by all the failures I and others routinely saw.

Why is fully autonomous driving important?

In one word: economics.  The whole point of driverless cars has, for over a decade, been to produce a taxi service where human drivers are not needed.  The business model, the model all the car companies have been going after, is that the cars can drive themselves so no human needs to be paid as part of a taxi service.

We were told by the companies that their vehicles were safer than human driven vehicles, but in fact they routinely needed humans to monitor and to control them.  At one level I’m shocked, shocked, I tell you.  At another level I am embarrassed that they fooled me.  I had thought they were driving with no person monitoring them.

The reason I posted my predictions and talk about them every year was to try to overcome the hype that fools people about how far along technology is. The hucksters beat me in this case.

There is one other company in the US providing so-called autonomous taxi rides. I don’t know whether or not to believe them. I just don’t know. Fool me once…

And here is an interesting tweet:

 

And about that Fully Self Driving for consumer cars

In December there was a harsh but fair story in Rolling Stone about Tesla’s non-stop hype about its self-driving cars, that is, to put it plainly, a complete lie, and it has been every years since 2014 when the CEO of Tesla has announced that full self driving will be here “this year”. We now have ten years of the same promise, and ten years of it not being true.  [For many people that is “fool me ten times”.]

There is a reference in that story to a research report from earlier in the year by Noah Goodall at the Virginia Transportation Research Council. He disentangles some of Tesla’s misleading statistics:

Although Level 2 vehicles were claimed to have a 43% lower crash rate than Level 1 vehicles, their improvement was only 10% after controlling for different rates of freeway driving. Direct comparison with general public driving was impossible due to unclear crash severity thresholds in the manufacturer’s reports, but analysis showed that controlling for driver age would increase reported crash rates by 11%.

Prediction
[Self Driving Cars]
Date2018 CommentsUpdates
A flying car can be purchased by any US resident if they have enough money.NET 2036There is a real possibility that this will not happen at all by 2050.
Flying cars reach 0.01% of US total cars.NET 2042That would be about 26,000 flying cars given today's total.
Flying cars reach 0.1% of US total cars.NIML
First dedicated lane where only cars in truly driverless mode are allowed on a public freeway.
NET 2021
This is a bit like current day HOV lanes. My bet is the left most lane on 101 between SF and Silicon Valley (currently largely the domain of speeding Teslas in any case). People will have to have their hands on the wheel until the car is in the dedicated lane.
Such a dedicated lane where the cars communicate and drive with reduced spacing at higher speed than people are allowed to drive
NET 2024
20240101
This didn't happen in 2023 so I can call it now. But there are no plans anywhere for infrastructure to communicate with cars, though some startups are finally starting to look at this idea--it was investigated and prototyped by academia 20 years ago.
First driverless "taxi" service in a major US city, with dedicated pick up and drop off points, and restrictions on weather and time of day.
NET 2021
The pick up and drop off points will not be parking spots, but like bus stops they will be marked and restricted for that purpose only.20240101
People may think this happened in San Francisco in 2023, but it didn't. Cruise has now admitted that there were humans in the loop intervening a few percent of the time. THIS IS NOT DRIVERLESS. Without a clear statement from Waymo to the contrary, one must assume the same for them. Smoke and mirrors.
Such "taxi" services where the cars are also used with drivers at other times and with extended geography, in 10 major US citiesNET 2025A key predictor here is when the sensors get cheap enough that using the car with a driver and not using those sensors still makes economic sense.
Such "taxi" service as above in 50 of the 100 biggest US cities.NET 2028It will be a very slow start and roll out. The designated pick up and drop off points may be used by multiple vendors, with communication between them in order to schedule cars in and out.
Dedicated driverless package delivery vehicles in very restricted geographies of a major US city.
NET 2023
The geographies will have to be where the roads are wide enough for other drivers to get around stopped vehicles.
A (profitable) parking garage where certain brands of cars can be left and picked up at the entrance and they will go park themselves in a human free environment.
NET 2023
The economic incentive is much higher parking density, and it will require communication between the cars and the garage infrastructure.
A driverless "taxi" service in a major US city with arbitrary pick and drop off locations, even in a restricted geographical area.
NET 2032This is what Uber, Lyft, and conventional taxi services can do today.20240101
Looked like it was getting close until the dirty laundry came out.
Driverless taxi services operating on all streets in Cambridgeport, MA, and Greenwich Village, NY. NET 2035Unless parking and human drivers are banned from those areas before then.
A major city bans parking and cars with drivers from a non-trivial portion of a city so that driverless cars have free reign in that area.NET 2027
BY 2031
This will be the starting point for a turning of the tide towards driverless cars.
The majority of US cities have the majority of their downtown under such rules.NET 2045
Electric cars hit 30% of US car sales.NET 202720240101
This one looked pessimistic last year, but now looks at risk. There was a considerable slow down in the second derivative of adoption this year in the US.
Electric car sales in the US make up essentially 100% of the sales.NET 2038
Individually owned cars can go underground onto a pallet and be whisked underground to another location in a city at more than 100mph.NIMLThere might be some small demonstration projects, but they will be just that, not real, viable mass market services.
First time that a car equipped with some version of a solution for the trolley problem is involved in an accident where it is practically invoked.NIMLRecall that a variation of this was a key plot aspect in the movie "I, Robot", where a robot had rescued the Will Smith character after a car accident at the expense of letting a young girl die.

Electric Cars

I bought my first electric car this year.  I love it.

But it also made me realize how hard it is for many people to own an electric car. I have my own garage under my house. I charge my car in there.  A large portion of car owners in my city, San Francisco, have no private parking space. How does charging work for them?  They need to go to a public recharge station. And wait to charge. Having an electric car is an incredible time tax on people who do not have their own parking spot with access to electricity.  I had not fully appreciated how this will slow down adoption of electric cars until I owned one myself and could reflect on my own level of privilege in this regard.

Manufacturers have cut back on their sales forecasts for electric vehicles over the next couple of years, and are even reducing production. Ford did this after reporting that it lost $36,000 on every EV it sold in Q3.

See an analysis of why the scaling is hard just from a supply chain point of view.

Year over year electric car sales in Q3 were up by 49.8%.  Overall car sales were up significantly so the overall percentage is not that big and this report says that the electric sales rate in the US by year are:

2021: 3.2%

2022: 5.8%

2023: 7.2%

So this says my estimate of 30% electric sales by 2027 is very much at risk, as that requires two more full doublings of percentage just as manufacturers are slowing things down.  I was abused heavily on Twitter for being so pessimistic back in 2018. Right now I think my prediction was accurate or even optimistic.

At year end there was a well researched article in the Wall Street Journal. Electric vehicle purchase rates are geographically lumpy, and the rate of increase has slowed in many places.

Flying Cars

When I made my predictions about flying cars back in 2018, flying cars were cars that could drive on roads and that could also fly. Now the common meaning has changed to largely be eVTOL’s, electric Vertical Take Off and Landing vehicles, that sit statically on the ground when not flying.  I.e., electric helicopters. And people talk about them as taxis that will whisk people around through the three dimensional airspace avoiding clogged roads. And them being everywhere.

Spoiler alert.  Not going to happen.

Late in 2022, soon after Larry Page pulled the plug on his twelve year old eVTOL company I did an analysis of the lack of videos showing any practical looking flights of any potential eVTOL solution, despite companies have multi-billion dollar valuations. If practical eVTOL solutions are around the corner certainly there should be videos of them working. There aren’t.

Late in 2023, one of the “sky high” valuation companies participated in an unveiling of a taxi service in NYC, with breathless commentary, and even a breathless speech from the Mayor of NYC.  They’re coming, they’re coming, we are all going to live in a new world of flying taxis.

Here is the video, from November 14, 2023, of a Joby eVTOL taxi flight in NYC.  It is titled: “First-ever electric air taxi flight takes off in NYC”.

Except that it has no passengers, and it just flies a slow loop out over the water and back. It has “Experimental” painted on the front door.

Not the four passengers and 200mph speed that co-founder JoeBen Bevirt speaks about in the video.  It is not an operational flight path at all. You can see that the passenger seats are all empty at the 19 second mark, whereas the pilots legs are clearly visible.

In a video from about a month prior titled “Flying Joby’s Electric Air Taxi with a Pilot On Board”, shot in Marina, California, the company says that they have now flown their vehicle with four different test pilots.  And the video shows it off very much as a test flight, with no passengers onboard.

There is no mention of automating out the pilot, which was one of the key fantasies of eVTOL taxis.

Also note the size of this vehicle.  There are many fossil fuel powered helicopters that are much smaller.  This is not going to be a personally owned vehicle for the masses.

Don’t hold your breath. They are not here. They are not coming soon.

Robotics, AI, and Machine Learning

Prolog

It is worth reading this story, with the increasing number of CEOs in Silicon Valley ending up in jail for overhyping their businesses to the point of fraud. Many, many, more walk that line, some for multiple companies at the same time. From the story:

“Governance got a bit loosey-goosey during the bubble,” said Healy Jones, vice president of financial strategy at Kruze Consulting, a provider of financial services for start-ups. Lately, Mr. Jones said, he has noticed venture firms doing more due diligence on potential investments, but “they probably shouldn’t get a gold star for fulfilling their job description.”

This is what happens when everyone is entitled to their own alternate facts. Current icons, young brash worshipped CEOs, are not immune to forcing their own alternate facts upon, first, eager investors, and second, people willing to set aside critical judgement when promised a magical rosy future.

It is far easier to make stuff up than refute it. It has happened with self driving cars, and flying taxis. It is rampant in the humanoid robotics and AI spaces.

Summary

I only made three comments in the table this year, and only one of them is directly about a predicted milestone being hit.  And, as you might guess, they are all about Generative AI and Large Language Models. No question that 2023 was the year when those topics hit the general consciousness of the scientific, cultural, and political worlds.  I’ve officially been an AI researcher since 1976, and before that I was a high school and undergraduate hobbyist, but this is the first year I have heard politicians throughout the world say the words “Artificial Intelligence”.  And when they have said those words no one has been surprised, and everyone sort of thinks they know what they are talking about.

I had not bothered to predict a rosy future for humanoid robots, as when I made my predictions I had been working in that space for over twenty five years and had built both research humanoids and thousands of humanoid robots that were deployed in factories. The extraordinarily difficult challenges, requiring fundamental research breakthroughs, were clear to me. There are plenty of naive entrepreneurs saying that work will be changed by humanoid robots within a handful of years. They are wrong. My lack of predictions about humanoid robots were based on my expectations that they will not play any significant role for at least another 25 years.

Here are some humanoid robots that I and the teams I have led have built: Cog (large team of graduate students), Kismet (Cynthia Breazeal, in the picture), Domo (Aaron Edsinger and Jeff Weber), and then Baxters (Rethink Robotics).







 

The prediction that happened this year

I had predicted that the “next big thing” in AI, beyond deep learning, would show up no earlier than 2023, but certainly by 2027. I also said in the table of predictions in my January 1st, 2018, that for sure someone was already working on that next big thing, and that papers were most likely already published about it.  I just didn’t know what it would be; but I was quite sure that of the hundreds or thousands of AI projects that groups of people were already successfully working hard on, one would turn out to be that next big thing that everyone hopes is just around the corner.  I was right about both 2023 being when it might show up, and that there were already papers about it before 2018.

Why was I successful in those predictions?  Because it always happens that way and I just found the common thread in all “next big things” in AI, and their time constants.

The next big thing, Generative AI and Large Language Models started to enter the general AI consciousness last December, and indeed I talked about it a little in last year’s prediction update.  I said that it was neither the savior nor the destroyer of mankind, as different camps had started to proclaim right at the end of 2022, and that both sides should calm down.  I also said that perhaps the next big thing would be neuro-symbolic Artificial Intelligence.

By March of 2023, it was clear that the next big thing had arrived in AI, and that it was Large Language Models.  The key innovation had been published before 2018, in 2017, in fact.

Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). “Attention is All you Need”. Advances in Neural Information Processing Systems. Curran Associates, Inc. 30.

So I am going to claim victory on that particular prediction, with the bracketed years (OK, so I was a little lucky…) and that a major paper for the next big thing had already been published by the beginning of 2018 (OK, so I was even luckier…).

Prediction
[AI and ML]
Date2018 CommentsUpdates
Academic rumblings about the limits of Deep Learning
BY 2017
Oh, this is already happening... the pace will pick up.
The technical press starts reporting about limits of Deep Learning, and limits of reinforcement learning of game play.
BY 2018
The popular press starts having stories that the era of Deep Learning is over.
BY 2020
VCs figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".
NET 2021
I am being a little cynical here, and of course there will be no way to know when things change exactly.
Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.
NET 2023
BY 2027
Whatever this turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.20240101
It definitely showed up in 2023. It was in the public mind in December 2022, but was not yet the big thing that it became during 2023. A year ago I thought it would perhaps be neuro-symbolic AI, but clearly it is LLMs, and ChatGPT and its cousins. And, as I predicted in 2018 it was something already being worked on as the "attention is all you need" paper, the key set of ideas, was published in 2017.
The press, and researchers, generally mature beyond the so-called "Turing Test" and Asimov's three laws as valid measures of progress in AI and ML.
NET 2022
I wish, I really wish.20230101
The Turing Test was missing from all the breathless press coverage of ChatGPT and friends in 2022. Their performance, though not consistent, pushes way past the old comparisons.
20240101
The Turing Test was largely missing from the press in 2024 also, and there was a story in Nature commenting on that. So yes, this has now happened.
Dexterous robot hands generally available.NET 2030
BY 2040 (I hope!)
Despite some impressive lab demonstrations we have not actually seen any improvement in widely deployed robotic hands or end effectors in the last 40 years.
A robot that can navigate around just about any US home, with its steps, its clutter, its narrow pathways between furniture, etc.Lab demo: NET 2026
Expensive product: NET 2030
Affordable product: NET 2035
What is easy for humans is still very, very hard for robots.
A robot that can provide physical assistance to the elderly over multiple tasks (e.g., getting into and out of bed, washing, using the toilet, etc.) rather than just a point solution.NET 2028There may be point solution robots before that. But soon the houses of the elderly will be cluttered with too many robots.
A robot that can carry out the last 10 yards of delivery, getting from a vehicle into a house and putting the package inside the front door.Lab demo: NET 2025
Deployed systems: NET 2028
A conversational agent that both carries long term context, and does not easily fall into recognizable and repeated patterns.
Lab demo: NET 2023
Deployed systems: 2025
Deployment platforms already exist (e.g., Google Home and Amazon Echo) so it will be a fast track from lab demo to wide spread deployment.20240101
One half of this happened this year. ChatGPT has been connected to microphones and speakers so you can now talk to it. and It does not fall into recognizable patterns. BUT the other half is the half it does not have; it has no updatable memory apart from its token buffer of what it has just said. Long term context may be long term in coming.
An AI system with an ongoing existence (no day is the repeat of another day as it currently is for all AI systems) at the level of a mouse.NET 2030I will need a whole new blog post to explain this...
A robot that seems as intelligent, as attentive, and as faithful, as a dog.NET 2048This is so much harder than most people imagine it to be--many think we are already there; I say we are not at all there.
A robot that has any real idea about its own existence, or the existence of humans in the way that a six year old understands humans.NIML

 

What do I think about Generative AI and Large Language Models?

On November 28th I gave a talk at MIT as the opening keynote for MIT’s Generative AI Week. Here is the video of my talk, and here is a part of my talk written up as a blog post.

The title was “Unexpected manna mantra“. I didn’t want to talk about all the wealth or destruction (see salvationists versus doomers) that others talk about, and hence the crossed out “manna” in the title. Instead the talk is about what the existence of these “valuable cultural tools” (due to Alison Gopnik at UC Berkeley) tells us about deeper philosophical questions about how human intelligence works, and how they are following a well worn hype cycle that we have seen again, and again, during the 60+ year history of AI.

I concluded my talk encouraging people to do good things with LLMs but to not believe the conceit that their existence means we are on the verge of Artificial General Intelligence.

By the way, there are the initial signs that perhaps LLMs have already passed peak hype. And the ever interesting Cory Doctorow has written a piece on what will be the remnants after the LLM bubble has burst. He says there was lots of useful stuff left after the dot com bubble burst in 2000, but not much beyond the fraud in the case of the burst crypto bubble.

He tends to be pessimistic about how much will be left to harvest after the LLM bubble is gone. Meanwhile right at year’s end the lawsuits around LLM training are starting to get serious.

HUMAN SpaceFLIGHT

Crewed space flight crawled on during 2023. It did not feel like a golden age.

There were only five crewed orbital flights in 2023, two where Russian Soyuz, two were NASA SpaceX Dragons, and one was a commercial Axiom-2 SpaceX Dragon, with three paying customers. All five flights were to the International Space Station (ISS).

There were seven crewed suborbital flights, all by Virgin Galactic. Two were company test flights, and five had at least some paying customers on board. This means that Virgin Galactic has now had a total of six flights which involved more than test pilots (the previous such flight was in 2021).

Blue Origin had a mishap with an uncrewed suborbital vehicle in 2022, and finally flew an uncrewed vehicle again on December 19th, 2023. Perhaps they will be back to crewed flights in 2024.

This, again, was not the year that space tourism really took off.  In fact many of the paying customers were from space agencies of other countries that do not have their own human launch capability.. A market for getting national astronauts into space is starting to develop. The 14 day Axiom-3 mission scheduled for January 2024 will take three paying customers to the ISS, all of whom are national astronauts, from Italy, Turkey, and Sweden. The Italian astronaut, Walter Villadei, flew one of the suborbital Virgin Galactic flights in 2023.

The bright spot for space in 2023 was the continued unparalleled (ever) success of SpaceX’s Falcon 9 rockets. They had zero failures,  There with 91 launches of the single booster version, and every one of those boosters were recovered in vertical soft landings (though late in December B1058 that had successfully landed on a floating barge, after its nineteenth launch, was destroyed when it fell over in rough seas being transported back to land). Three of those launches sent people to the ISS. There were 5 launches of Falcon Heavy, the triple booster version. All four attempts to recover the two side boosters were successful, bringing all eight of them back successfully.

SpaceX Falcons had a total of 31 launches in 2021, 61 in 2022, and now 96 in 2023. There have now been a total of 285 single booster launches with only 2 failures, and nine Falcon Heavy  launches with no failures. SpaceX’s Falcon rocket is in a success class of its own.

It is worth noting, however, that this large scale successful deployment took time. The first launch of a Falcon 9 took place in June 2010, thirteen and a half years ago. The attempted recovery of that booster failed. By the end of 2013 there had been only seven launches total, with no successful booster recoveries, despite four attempts. The first booster to be recovered (but not reflown) was in December 2015.

It wasn’t until March 2017 that there was a reflight of a recovered booster (first flown in April 2016).

Prediction
[Space]
Date2018 CommentsUpdates
Next launch of people (test pilots/engineers) on a sub-orbital flight by a private company.
BY 2018
A few handfuls of customers, paying for those flights.
NET 2020
A regular sub weekly cadence of such flights.
NET 2022
BY 2026
20240101
There were four flights in 2021, three in 2022, and seven, five with customers on board, in 2023--all of them by Virgin Glactic. Blue Origin did not fly in 2023. At this point 2026 is looking doubtful for regular flights every week.
Regular paying customer orbital flights.NET 2027Russia offered paid flights to the ISS, but there were only 8 such flights (7 different tourists). They are now suspended indefinitely. 20240101
There were three paid flights in 2021, and one each in 2022, and 2023, with the latter being the Axiom 2 mission using SpaceX hardware. So not regular yet, and certainly not common.
Next launch of people into orbit on a US booster.
NET 2019
BY 2021
BY 2022 (2 different companies)

Current schedule says 2018.20240101
Both SpaceX and Boeing were scheduled to have crewed flights in 2018. SpaceX pulled it off in 2020, Boeing's Starliner did not fly at all in 2023, but is scheduled to launch with people onboard for the first time in April 2024.
Two paying customers go on a loop around the Moon, launch on Falcon Heavy.
NET 2020
The most recent prediction has been 4th quarter 2018. That is not going to happen.20240101
Starship launched twice in 2023 but didn't get to orbit either time. This is going to be well over six years later than the original prediction by the CEO of SpaceX.
Land cargo on Mars for humans to use at a later date
NET 2026SpaceX has said by 2022. I think 2026 is optimistic but it might be pushed to happen as a statement that it can be done, rather than for an pressing practical reason.20240101
I was way too optimistic, and bought into the overoptimistic hype of the CEO of SpaceX even though I added four years, doubling his estimated time frame.
Humans on Mars make use of cargo previously landed there.NET 2032Sorry, it is just going to take longer than every one expects.
First "permanent" human colony on Mars.NET 2036It will be magical for the human race if this happens by then. It will truly inspire us all.
Point to point transport on Earth in an hour or so (using a BF rocket).NIMLThis will not happen without some major new breakthrough of which we currently have no inkling.
Regular service of Hyperloop between two cities.
NIML
I can't help but be reminded of when Chuck Yeager described the Mercury program as "Spam in a can".
20240101
Calling this one 26 years early. As of today no-one is still working on this in an operating company.

 

Boeing’s Starliner

First announced in 2010 Boeing’s Starliner was originally scheduled to fly a human crew in 2018. It carried out its second uncrewed flight in May 2022, and is now scheduled to have its first crewed test flight in April 2024.

Thereafter it is expected to fly with a crew once a year.  After this dismally long development period, that will give the US its second commercial human capable orbital space craft.

Starship

Starship is SpaceX’s superheavy two stage rocket, designed to put 150 tons of payload into orbit, but also be able to go to the Moon or Mars. The first stage has 33 Raptor engines, and that stage is to land back on a ship or land as the current Falcon first stages do so successfully. The second stage has a total of six Raptor engines, three optimized to operate in space and three in the atmosphere. The second stage is to return from orbit burning off kinetic energy using a heat shield to re-enter the atmosphere, and then land vertically back at the launch site.

Over US Thanksgiving in 2021 the CEO of SpaceX urged his workers to abandon their families and come in to work to boost the production rate of Raptor engines. In his email he said:

What it comes down to is that we face genuine risk of bankruptcy if we cannot achieve a Starship flight rate of at least once every two weeks next year.

“Next year” was 2022. There were zero flights in 2022, certainly not one every two weeks. There were two flights total in 2023, and both of those had both stages for the first time. Both flights in 2023 ended up with both stages blowing up. SpaceX has become renowned for moving fast and blowing stuff up.  But the US’s plan for returning people to the surface of the Moon in 2025 is now very unlikely. That plan requires 15 successful launches of Starship to operate flawlessly for that single mission.

The return to the Lunar surface is going to be significantly delayed, and revenue producing flights of Starship are going to be way behind schedule.

Artemis

The second Artemis mission, using the Orion Capsule, Artemis II, will fly to the Moon with four people aboard, the first crewed Artemis flight. It was scheduled to launch in May 2024, but has been delayed by six months. This will be the first crewed mission beyond low Earth orbit (LEO) since 1972.

Artemis III was scheduled to launch in 2025 with a return to the surface of the Moon. However that relied on using a Starship (itself refueled in LEO by 14 (yes, fourteen!!) other Starship launches) to land there.  No one any longer believes that schedule, and it is likely delayed a few years, given where Starship is in its development and current capability.

Blue Origin Orbital Class Engines and Vehicles

Back in 2022 Blue Origin delivered two BE-4 engines to ULA, a customer, for use in their new Vulcan Centaur rocket, freeing ULA from its reliance on Russian engines. The first launch was supposed to happen in 2023, and in December a launch was delayed until January 2024. It does look like it will fly soon.

A BE-4 exploded during testing at Blue Origin in June of 2023, but whatever issues were there seem to have been overcome. They are designed to fly 100 times each.

Blue Origin’s own first orbital class rocket, New Glenn, was also due to fly in 2023, with four BE-4 engines.  It has been delayed until August 2024.

And finally, hyperloop

My prediction was that hyperloop was not going to happen in my lifetime, i.e., not by 2050, still twenty six years from now. But I called it today in the table. I was right.

For those who don’t remember, the hyperloop concept was hyped as much as generative AI is these days. The idea was that small pods would rush down evacuated tubes (often said to be underground, which was the rationale for starting new tunnel boring companies), at hundreds of miles per hour.  With people in them.  Point to point, LA downtown to San Francisco downtown in an hour.

In 2018 I wrote about what is hard and what is easy, and why, and said:

Building electric cars and reusable rockets is fairly easy. Building a nuclear fusion reactor, flying cars, self-driving cars, or a Hyperloop system is very hard. What makes the difference?

And it turns out it was much harder. As of December 31st, 2023, Hyperloop One, started by a member of the billionaire we-can-do-anything-one-of-us-thinks-about-for-five-minutes-before-telling-the-world-about-my-visionary-idea club, has completely shut down.  It is particularly worth reading the brutal last two paragraphs of that story. And the last sentence is generally worth remembering at all times:

The future, it would seem, is nearly the same as the present.

As I have said many times:

Having ideas is easy. Turning them into reality is hard. Turning them into being deployed at scale is even harder.

Progress inches along. It did with ships, trains, automobiles, airplanes, rockets, and reusable boosters. All of them moved along with many players, inventors, and investors, over at least decades. Big things that involve a lot of kinetic energy, and especially those that also carry people, take somewhere from decades to centuries to develop and deploy at scale.

Looking Forward

Get your thick coats now. There may be yet another AI winter, and perhaps even a full scale tech winter, just around the corner. And it is going to be cold.

Autonomous Vehicles 2023, Part III

rodneybrooks.com/autonomous-vehicles-2023-part-iii/

To finish up this brief series on the reality of autonomous, or self driving vehicles, in 2023 I want to outline my experiences taking almost 40 rides in San Francisco in Cruise vehicles.

I have previously reported on my first three rides in Cruise vehicles back in May of 2022. In those three rides, as with all subsequent rides, there was no driver in the front seat, and the steering wheel turned as though a ghost was sitting there driving.

In 2023 I took roughly three dozen more rides. At first they were part of a special test program that Cruise operated for free. But the last half of them were part of the regular “taxi” service that Cruise started operating, where I paid for them just like one would using Uber or Lyft. For a while Cruise had 300 vehicles operating, but they backed off to 150 after some safety issues.

Here I report what the experience was like.

It was certainly different from using an Uber or a Lyft, and always more uncertain on when and whether I would be picked up, where I would be picked up and where I would be dropped off. The waits were sometimes over twenty minutes for the vehicle to arrive, and sometimes the ride would be cancelled while my promised vehicle was finally in view, and it just cruised on by me, with no explanation on why I had been dumped.

On pick up location I need to point out that during 2022, and most of 2023, the streets of San Francisco were thick with vehicles from both Waymo and Cruise driving around with no one in the driver’s seat. They were collecting data. This followed years of both companies having drivers in their vehicles collecting data, and mapping the entire city.

I would see them every single time I left my house, day or night — but never on my block itself — it is a very difficult block for human drivers to navigate.  Even the human driven data collection Waymos and Cruises never ventured on my block. On one occasion in November of this year I saw four empty Waymo’s drive down my hill–perhaps it was a mistake.  But overall, I do not think the companies have ever mapped my particular block.

Cruise always told me it could not pick me up at my house. Sometimes it told me where to walk to (sometimes as far as two blocks away) and sometimes it asked me to choose.

Earlier in 2023 Cruise vehicles had a hard time dropping me off in a busy street and would continue around the block searching for a place to pull in to out of the traffic. Towards the end of the year, before Cruise shut them down completely, they would stop in traffic in a crowded restaurant area much as an Uber or Lyft would.

But this ease of drop off did not extend to near my house, a leafy, and at night mostly deserted streets neighborhood. They would insist on finding a place to pull into out of the non-existent traffic, sometimes a bit of a hike to get back to my house.

Likewise there was a geographically determined different behavior for pick up. Earlier in the year they would keep driving until they found a place to pull into. That continued near my house throughout the year. One time it was so far further on from where it had told me to wait, that by the time I caught up with car, running to get there I might add, a human was speaking from the car asking me if I was having trouble pushing the unlock button on my app.

However in crowded restaurant areas the Cruise vehicles first became more aggressive about pulling into an empty spot, with stronger braking than a human driver use, perhaps because humans were picking up on the spot being empty earlier than the Cruise vehicles. Later in the year the Cruise vehicles started to imitate human Uber and Lyft drivers and would stop in an active traffic lane to enable a pickup.

In summary, the pick up and drop off behavior of Cruise vehicles got incrementally better in crowded areas throughout the year. The behavior did not change in easier less traffic areas, and was quite substandard compared to human drivers in those areas. Some blocks have not been mapped and are no go areas for driverless ride services. Whether one actually gets a ride or not is rather hit and miss, even if you are prepared to wait a long time.

It is not yet a deployed service. And now Cruise has shut down ride services in all six cities that it had begun operating in, while they thoroughly review safety. And it turns out they were not even autonomous. Human operators were intervening in almost 5 percent of the rides.

Meanwhile, Cruise’s owning company, GM, has announced they are pulling back on their investment of operating cash and other resources. Certainly GM’s driverless car service is at risk.

And also it turns out the cars are not autonomous or driverless.  See my upcoming new year report.

Three Things That LLMs Have Made Us Rethink

rodneybrooks.com/three-things-that-llms-have-made-us-rethink/

There are three things that the existence of LLMs, such as ChatGPT-3.5 and ChatGPT-4 make us have to rethink. At different times and amongst different communities they have all had lots of AI researchers talking about them, often with much passion.

Here are three things to note:

  1. The Turing Test has evaporated.
  2. Searle’s Chinese Room showed up, uninvited.
  3. Chomsky’s Universal Grammar needs some bolstering if it is to survive.

We’ll talk about each in turn.

The Turing Test

In a 1950 paper titled Computing Machinery and Intelligence, Alan Turing used a test which involved a human deciding whether an entity that the person was texting with was a human or a computer. Of course, he did not use the term “texting” as that had not yet been invented, rather he suggested that the communication was via a “teleprinter”, which did exist at the time, where the words typed in one location appeared on paper in a remote location. “Texting” is the modern equivalent.

Turing used this setup as rhetorical device to argue that if you could not accurately and reliably decide whether it was a person or a computer at the other end of the line then you had to grant that a machine could be intelligent.  His point was that it was not just simulating intelligence but that it would actually be intelligent if people could not tell the difference.

Turing said;

I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent, chance of making the right identification after five minutes of questioning.

His number 109 referred to how many bits of program would be needed to achieve this result, which is 125MB, i.e., 125 Mega Bytes. Compare this with ChatGPT-3.5 which has 700GB, or 700 Giga Bytes, of weights (175 billion 32 bit weights) that it has learned, which is almost 6,000 times as much.

His paragraph above continues:

The original question, ‘Can machines think!’ I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. 

Despite his goal to be purely a rhetorical device to make the question, ‘Can machines think!’ (I assume the punctuation was a typo and was intended to be ‘?’) meaningless, this led to people calling the machine/person discernment test the Turing Test, and it became the default way of thinking about how to determine when general Artificial Intelligence had been achieved.  But, of course, it is not that simple. That didn’t stop annual Turing Tests being set up, with entrants from mostly amateur researchers, who had built chat bots designed not to do any useful work in the world, but designed and built simply to try to pass the Turing Test. It was a bit of a circus and mostly not very useful.

Earlier this year I felt like I was not hearing about the Turing Test with regards to all the ChatGPTs, and in fact the scientific press had noticed this too, with this story in Nature in July of this year:

 

Don’t worry, there are still papers being written on the Turing Test and ChatGPT, for instance this one from October 2023, but the fervor of declaring that it is important has decreased.

We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%).

In general the press has moved away from the Turing Test. ChatGPT seems to have the sort of language expertise that people imagined some system as intelligent as a person would have, but it has become clear that it is not the crystalline indication of intelligence that Turing was trying to elucidate.

SEARLE’S CHINESE ROOM

In 1980, John Searle, a UC Berkeley philosopher, introduced the idea of a “Chinese Room”, as a way to argue that computers could not be truly intelligent in the way that people are, not truly engaged with the world in the way people are, and not truly sentient in the way people are.

He chose “Chinese” as the language for the room as it was something totally foreign to most people working in Artificial Intelligence in the US at the time. Furthermore its written form was in atomic symbols.

Here is what ChatGPT-3.5 said when I asked it to describe Searle’s Chinese Room. I have highlighted the last clause in blue.

Imagine a person who doesn’t understand the Chinese language locked inside a room. This person has a set of instructions (a program) written in English that tells them how to manipulate Chinese symbols based on the symbols that are input to the room through a slot. They have no understanding of Chinese and don’t know the meanings of the symbols.

From the outside, someone passes messages written in Chinese through the slot into the room. The person inside the room follows the instructions in the program and produces responses in Chinese based solely on symbol manipulation, without understanding the content or meaning of the messages.

To an observer outside the room, it may appear as though the person inside understands Chinese, as they can generate coherent responses in Chinese. However, the person inside the room doesn’t actually understand the language. They are merely following a set of syntactic rules or algorithms for manipulating symbols, without grasping the semantics or meanings of those symbols.

To prove that ChatGPT really is the Chinese Room, I typed in some Chinese text to version 3.5 and it responded with Chinese text:

艾未未是谁?

艾未未是一位知名的中国艺术家、社会活动家和评论家。他以其大胆的艺术创作和公开对政府审查和人权侵犯的批评而闻名。艾未未的作品涉及雕塑、摄影、行为艺术、装置艺术和社交媒体等多种媒介,展现了对政治、社会和人权议题的关注。他因批评中国政府而多次引起争议,这导致了他与政府当局的冲突,并且在过去曾经被拘留和监禁。他因其独特的艺术作品和公开表达的观点而在国际艺术界和社会活动家圈中备受瞩目。

My question was “Who is Ai Wei Wei?” and it responded in what people tell me is perfectly good Chinese, describing the Chinese artist Ai Wei Wei, his art, and his run ins with the government.

So here we have Searle’s Chinese Room.  It has been trained on billions of words from the web but has no experience of the world, has never heard Chinese spoken, and has never even read Chinese characters as characters but rather has seen them only as integers along with (I assume) a pre-processing step to map whichever of the five (Unicode, GB, GBK, Big 5, and CNS) common digital Chinese character code standards each document uses to that common set of integers. (The fact that ChatGPT `knows’, in its description of the room, above, that Chinese is written in symbols is not because it has ever seen them, but because it has “read” that Chinese uses symbols.)

The fact that GPT is the Chinese Room, the fact that one now exists, means that many of the old arguments for and against Searle’s position that he was staking out with his rhetorical version of the room must now be faced squarely and perhaps re-evaluated.  Searle’s Chinese Room was a topic of discussion in AI for well over 25 years. Everyone had to have an opinion or argument.

In my book Flesh and Machines: How Robots Will Change Us (Pantheon, New York, 2002), I made two arguments that were in opposition to Searle’s description of what his room tells us.

Firstly, I argued (as did many, many others) that indeed Searle was right that the person in the room could not be said to understand Chinese. Instead we argued that it was the whole system, the person, the rule books, and the state maintained in following the rules that was what understood Chinese. Searle was using the person as a stand in for a computer fresh off the production line, and ignoring the impact of loading the right program and data on to it. In the ChatGPT case it is the computer, plus the algorithms for evaluating linear neuron models plus the 175 billion weights that are together what make ChatGPT-3.5 understand Chinese, if one accepts that it does. In my book I said that no individual neuron in a human brain can be said to understand Chinese, it has to be the total system’s understanding that we talk about. ChatGPT-3.5 is an example of a computer doing the sort of thing that Searle was arguing was not possible, or at least should not be spoken about in the same way that we might speak about a person understanding Chinese.

Secondly, I argued (using Searle as the person in the room as he sometimes did):

Of course, as with many thought experiments, the Chinese room is ludicrous in practice. There would be such a large set of rules, and so many of them would need to be followed in detailed order that Searle would need to spend many tens of years slavishly following the rules, and jotting down notes on an enormous supply of paper. The system, Searle and the rules, would run as a program so slowly that it, the system, could not be engaged in any normal sorts of perceptual activity. At that point it does get hard to effectively believe that the system understands Chinese for any usual understanding of `understand’. But precisely because it is such a ludicrous example, slowed down by factors of billions, any conclusions from that inadequacy can not be carried over to making conclusions about whether a computer program running the same program `understands’ Chinese.

Hmm, well my bluff has been call by the existence of ChatGPT.  First, note that I was right about the size of the rule set, 175 billion neural weights, that it would take a person effectively forever to follow them. But every modern laptop can hold all those rules in the file system (it is less than a terabyte of memory), and the algorithm is parallel enough that a chunk of processing in the cloud can make ChatGPT run at human language speeds.

If I maintain my above argument from 2002, I would have to say that ChatGPT does `understand’ Chinese. But those who have read my writings over the years would guess, rightly, that I don’t think it does. Without grounding in physical reality I don’t think a machine can understand in the same way we do.  ChatGPT is just like someone following rules with no semantic understanding of the symbols, but it does it at the speed my argument above said was necessary for it to really be understanding.  But now I’m going to say it still doesn’t understand. My old self and my today self are not being intellectually coherent, so I am going to have to think about this some more over the next few years and refine, perhaps rethink, but certainly change in some way what it is I conclude from both Searle and ChatGPT existing.

Other people over the last forty years have argued, and I have agreed, that language in humans is strongly grounded in non-language. So, we have argued that a computer program, like ChatGPT-3.5 could not have a consistent performance level that would seem like human language.  ChatGPT-3.5 certainly seems to have such consistent performance, as long as you don’t poke it too deep–it certainly has a level that would work for most of your daily interactions with strangers.  Our arguments are therefore challenged or broken. I don’t yet know how to fix them.

CHOMSKY’S UNIVERSAL GRAMMAR

Nom Chomsky is the most influential linguist ever and he spent most of his career in MIT’s linguistics department. His advocacy of the existence of a universal grammar system in humans, and humans only, has shaped much cognitive science and theories about cognition in both humans and non-humans.

I asked ChatGPT to “explain Chomsky’s universal grammar”, and it replied:

Noam Chomsky’s theory of Universal Grammar (UG) is a hypothesis in the field of linguistics that suggests that the ability to acquire language is innate to humans, and that there is a universal set of grammatical structures that underlies all human languages. Chomsky introduced this concept in the 1950s and has since developed and refined it over the years.

Key points of Chomsky’s Universal Grammar include:

Innate Language Faculty: Chomsky argues that humans are born with an innate capacity for language. He proposes the existence of a “language acquisition device” (LAD) or some innate cognitive structures that enable children to rapidly and effortlessly learn the language(s) to which they are exposed during early childhood.

Universal Grammar Principles: Chomsky suggests that all human languages share a common underlying structure or set of principles. These principles are considered to be part of the Universal Grammar and are hard-wired into the human brain. They provide a framework that constrains the range of grammatical structures and rules that are possible in any language.

Poverty of the Stimulus: Chomsky’s argument for the existence of Universal Grammar includes the “poverty of the stimulus” idea. He contends that the linguistic input that children receive is often insufficient or ambiguous, yet they are able to acquire complex grammatical structures. This implies that there must be an innate mechanism guiding language acquisition.

Language Acquisition as Rule-Based: Chomsky posits that the process of language acquisition involves the internalization of grammatical rules. Children, according to this theory, unconsciously deduce the grammatical structures of their native language(s) based on the linguistic input they receive.

It’s important to note that while Chomsky’s Universal Grammar theory has been influential and has shaped the field of linguistics, it is not without its critics. Some researchers argue for alternative theories of language acquisition that emphasize the role of environmental factors and learning mechanisms. Additionally, the nature and specifics of the supposed universal grammar are still topics of ongoing research and debate within the field of linguistics.

ChatGPT gets it right, this is what Chomsky was arguing for. I highlighted in blue one particular alternate view that ChatGPT brought up.  [It is no surprise to me that ChatGPT made such a good summary–that is the sort of thing it is quite good at, having ingested billions of pages of human written material on all sorts of matters.]

Here is a diagram I had someone draw for me to illustrate the idea of Universal Grammar. A child hears words and acquires both the grammar of the language and the language itself.

 

I asked: Does ChatGPT have a universal grammar module?
To which ChatGPT replied:

No, ChatGPT does not have a universal grammar module in the same sense as Chomsky’s Universal Grammar theory. ChatGPT is based on a different approach known as transfer learning, specifically the GPT (Generative Pre-trained Transformer) architecture developed by OpenAI.

ChatGPT learns from a diverse range of internet text during its pre-training phase, but it doesn’t have explicit knowledge or representation of grammatical rules as predefined modules. Instead, it captures patterns, context, and relationships within the data it has been trained on. It doesn’t have a built-in understanding of linguistic structures in the way that the concept of Universal Grammar suggests.

The model relies on statistical patterns and associations in the data it has seen during training to generate contextually relevant responses. While it can generate coherent and contextually appropriate text, it doesn’t possess an explicit understanding of grammar rules or linguistic universals as proposed by Chomsky’s theory.

Again, it got it completely right.

Clearly ChatGPT (and other LLMs) learn human language to a level of performance that we had not previously see in computational systems, a level that we are forced to compare favorably with humans. And they learn language without any sort of universal grammar knowledge that Chomsky argues both constrains and enables what humans can learn as they acquire language. They also learn it without any role for environmental factors that ChatGPT mentioned above in its reference to criticisms of universal grammar, and that I highlighted in blue.

No structures in the human brain that implement a universal grammar system, even quite distributed, have been identified. ChatGPT certainly doesn’t have one. Does this put the universal grammar theory at risk?

Maybe.  Maybe not.

Chomsky’s argument about the poverty of stimulus for human children may be what saves the need for universal grammar in humans. ChatGPT is exposed to a few orders of magnitude of source material more (and it uses massive more energy to learn) than a human child.

On the other hand the ability to learn human grammar with no mechanism for grammar built in is certainly a surprise, at least to time traveling AI researchers from thirty or even twenty years ago.

Universal grammar enthusiasts have long argued that no other biological species can have language that has grammar and recursive composability, because they don’t have universal grammar. Computational LLMs do (have such language) and don’t (have universal grammar).

I personally remember very heated arguments on the MIT campus between Chomsky followers and computer scientists working on statistical models in the early 2000’s.  The arguments came up in computer science faculty hiring meetings. The Chomskians claimed that because of the need for universal grammar there would never be anything useful that came out of statistical approaches to language. ChatGPT has proved them wrong. (And yes, I personally shouted angrily at Chomskians in some of those hiring meetings.)

The question is whether there is a more narrowly arguable way to push forward on the need for universal grammar in resource and stimulus limited environments.

A Preliminary Conclusion

The existence, for one year now, of functional, available, LLMs has challenged some previous intellectual positions. Those challenges will be debated and made sharper. The existence of those challenges, however, does not necessarily mean that any, or even many, of the wild predictions around for how important and transformative   LLMs will be will come to pass. Things have changed forever, but as with many other forever changes in AI over the last 65 years, there are most likely many more things that we will change and that we will learn,

I believe that we are still in the uncertain baby step stages. It is worth repeating the last sentence of Alan Turing’s computational machinery paper that we started off with above. It is still as relevant today as it was in 1950.

We can only see a short distance ahead, but we can see plenty there that needs to be done. 

Autonomous Vehicles 2023, Part II

rodneybrooks.com/autonomous-vehicles-2023-part-ii/

I was going to write this post later this week filling in my promised experience from Thursday Oct 19th, 2023, experience of real fear that I might be involved in a really bad accident, while riding in a Cruise driverless taxi in San Francisco. The reason for rushing it out today is this story, today, that Cruise is no longer allowed to operate driverless taxis in San Francisco.

The story implies that they will no longer be allowed to operate even empty vehicles without a safety driver, which would mean two steps backwards from where they have been in San Francisco. It also says that Cruise misrepresented safety information to regulators.

My Recent Experience

I have taken around 36 Cruise driverless taxi rides over the last handful of months. They have had their ups and downs in user experience, and I had planned to talk about some of those in explaining why I do not think the experience is really what one expects from a deployed taxi service like Uber or Lyft.

But last Thursday night I had a moment where I experienced real fear, where for half a second I thought I might be involved in an extremely bad accident.

It was at night and we were crossing Divisadero, heading west, on Filbert. Left is a steep uphill few blocks on Divisadero. There was a car coming down the hill quite fast, as we crossed Divisadero. My Cruise, with nothing at all in front of it, braked hard, really hard, right in the middle of the intersection, harder than I had ever experienced a Cruise taxi braking. That brought us (me and my taxi) to almost a complete stop right in the path of the oncoming vehicle. Fortunately the other vehicle started to slow down and then the Cruise moved on out of its way.

This, above, is my recollection of what happened. When it braked hard a real pang of fear shot through my body. When I saw the car heading right at us a conscious version of that fear kicked in.

A human driver in that situation would mostly likely continue to drive and not brake at all. Braking was the best possible way to cause a collision. Not a good choice.

In previous accidents that have resulted in collisions Cruise vehicles have been at a stop. My interpretation, and I have no knowledge of whether this is true or not, was that rather than take the risk of hitting another vehicle while moving, the algorithms were set to freeze when there was an imminent collision, as better than running into someone else. A weird hard-wired trolley problem solution which does not protect the Cruise vehicle, but unfortunately for a rider does not protect them either. And in many cases increases the likelihood of  a collision rather than reduces it.

See a Cruise with a passenger freezing in the middle of an intersection back in August, getting hit and sending a passenger to hospital.

More to come…

Autonomous Vehicles 2023, Part I

rodneybrooks.com/autonomous-vehicles-2023-part-i/

My Early Experience with Self Driving Cars

Back in the summer of 1979 (forty four years ago) I was Hans Moravec’s gopher, at the Stanford AI Lab, helping test his self driving Cart for his PhD thesis. Every night that summer we would wait until midnight or so, when most people had gone home and so the single main frame computer was relatively unloaded. We would set up an indoor obstacle course for the robot, and then Hans would set it on its way. It would take nine visual images with a left to right sliding camera, process them and plan a path over the next fifteen minutes, then blind drive the first meter of that path. Four meters per hour was its averaged top speed.

We have come a long way since then.

I went on to build mobile robots at MIT, and from that work have come 50 million home-based mobile robots from a company (iRobot) that I founded with two students in 1990, along with military mobile robots, and as part of the path to mobile robots on Mars.  My current startup (my sixth) is also building mobile robots, deployed in warehouses–they navigate autonomously with all computation onboard, using cameras as their only source of information, but at speeds measured in meters per second, and they are safe in the presence of humans.

My Recent Writing about Self Driving Cars

I have long been skeptical of the promises made about how soon self driving cars will fill our streets, and have written blog posts about why on this site, in terms of unexpected consequences and edge cases, both in 2017. I made predictions about AVs (along with AI and space) back on Jan 1st, 2018, and have reviewed them every January 1st since then. In my review from this year, January 1st, 2023, I included the following updated graphic from 2017.

These are public prediction from March 2017, made by industry executives, about when driverless cars would be here. The dates in parentheses were the years they were made, the dates highlighted in pink were the years they said their predictions would come to pass that have since passed, and the orange  arrows indicate cases where I had seen them later walked back. As you can see they were all wrong, at least so far.

See a collection of prediction headlines from over the years just published by Gary Marcus.

I think these predictions, full of unsupported hubris, have done us a dis-service.

They reinforced the idea that we would have one-for-one replacement of human drivers with driverless cars. In all other cases where mankind has changed how we transport people and goods we have had massive changes in infrastructure. These range from agrarian to empire and Roman roads (still the outline of most major road routes across Europe), wharves in ports, inland canals, railroad tracks, paved roadways, freeways, airports, and world-wide air traffic control.

The tech enthusiasts, used to large scale deployment of software rather than physical objects, assumed that the world would stay the same, and instead we would just have driverless vehicles amongst human driven vehicles. This assumption was the source of my two critiques in 2017.

I have also noted that autonomous trains are still not very widely deployed, and where they are they have different infrastructure than human driven trains, including completely separate tracks. I have ridden them in many airports and out in and under cities in Toulouse and Tokyo, but they are not widespread. In the US the only significant self-driving trains outside of airports are to the west of Honolulu in Oahu, still not quite making it into the downtown area.

The dis-service of self driving predictions is that for the last dozen years we stopped talking about how to instrument our roads to make autonomous vehicles safe. I think we could have had self driving cars much more quickly if we had made offboard changes to our infrastructure, rather than imagining that everything would be done onboard.

RIDING IN AUTONOMOUS VEHICLES TODAY

I took my first ride in a self driving car on public roads back in April 2012, in a Google vehicle. That project later became Waymo.

In May 2022 I took three rides in a Cruise vehicle on the streets of San Francisco and blogged about it here.

I now regularly (more than once a week) ride in Cruise vehicles in San Francisco and have tweeted about it.

I often don’t report on individual rides as they happen without incident.  I have reported about particular rides, that I or others have taken, when I think that the vehicles have done something for which a human driver would somehow be sanctioned if they had done the same thing. That sanction level might be other drivers or pedestrians expressing censure, legal officials directing them to do something different, or issuance of a legal citation.

My next blog post will be about my experiences in self driving cars, summarized. Spoiler alert: in my last ride, just two days ago, I first experienced real fear, where for half a second I thought I might be involved in an extremely bad accident. The type of fear where you feel a jolt in your heart, and your body readies itself for fight or flight.

 

What Will Transformers Transform?

rodneybrooks.com/what-will-transformers-transform/

Generative Pre-trained Transformer models (GPTs) are now all the rage and have inspired op-eds being written by everyone from Henry Kissinger (WSJ) to Noam Chomsky (NYTimes) in just the last month. That sure is some hype level.

Way back in the early history of GPTs, January 1st this year, I wrote briefly about them and said:

Calm down people. We neither have super powerful AI around the corner, nor the end of the world caused by AI about to come down upon us.

I stick with that advice, but in this post I want to say why, and talk about where these systems will have impact. In short, there will be valuable tools produced, and at the same time lots of damaging misuse.

What triggers me to write here in more detail is both the continued hype, and the release of GPT-4 during the week of March 13th, and the posting of the “GPT-4 Technical Report” by many hundreds of authors at OpenAI.  [[The linked PDF is 98 pages long and contains two papers, one titled “GPT-4 Technical Report” that fills the first 38 pages of the PDF, and one titled “GPT-4 System Card” which is 60 pages long with its pages number 1 to 60, but mapped to PDF pages 39 to 98.]]

In mid-February of this year Stephen Wolfram wrote a very clear (it is long, this is a hard and big topic) post about how and why ChatGPT works. As he says, it is “Just Adding One Word at a Time”. [[Actually, in the couple of days I have been writing my post here, Wolfram’s post has also come out as a printed book…]]

Together, the OpenAI and Wolfram reports give a very good technical understanding of most things GPT.

state of the art GPTs from Open AI

For the last few months there has been lots of excitement about the 175 billion parameter GPT-3 from the company Open AI. It was set up, under the name ChatGPT, so that people could query it, type in a few words and have it “answer” the question. The words set the context and then one word at a time it pops out the word to follow what the context, now including what it had already said, that its learned model judged to be a good follow on word. There is some randomness in choosing among competing very good words, so it answers questions differently at different times. Microsoft attached GPT to its search engine Bing at around the same time.

Sometimes the results seem stunningly good, and people of all stripes have jumped to the conclusion that GPT-3 was heralded the coming of “Artificial General Intelligence”. [[By the way, even since the earliest days of AI, the 1955 proposal for the 1956 workshop on AI, the document in which the term AI first appears anywhere, the goal of the researchers was to produce general intelligence. That AGI is a different term than AI now is due to a bunch of researchers a dozen or so years ago deciding to launch a marketing campaign for themselves by using a new buzz acronym. “AGI” is just “AI” as it was known for the first 50+ years of its existence. Hype produced the term “AGI” with which we are now saddled.]]

This inference of AI arriving momentarily is a clear example of how people mistake performance for competence.  I talked about it back in 2017 as one of the seven deadly sins of predicting the future of AI. I said then that:

We [humans] are able to generalize from observing performance at one task to a guess at competence over a much bigger set of tasks. We understand intuitively how to generalize from the performance level of the person to their competence in related areas.

But the skills we have for doing that for a person break down completely when we see a strong performance from an AI program. The extent of the program’s competence may be extraordinarily narrow, in a way that would never happen with a person.  I think that a lot of people, early on certainly, made this mistake with GPT-3. I’ll show some examples of how GPT-3 and GPT-4 fail in unintuitive ways below.

Meanwhile, in the week of March 13th, 2023, GPT-4, with a reported 100 trillion learned parameters (i.e., about 571 times as many as for GPT-3), was introduced. And it turned out, according to Peter Lee at Microsoft, that the GPT version that had been part of Bing for the previous weeks and months was GPT-4 all along.

According to reports GPT-4 felt qualitatively different to many users. It felt “more”, and “better”. This sort of feeling something has changed has been reported in the 1990’s when people played a chess program with a previously unachievable depth limit, or when working with a theorem proving program when it can search further than before. (Refs, which will take me longer to track down are to Garry Kasparov and Bob Constable (of Cornell).) I think the “this is better” feeling is natural here, but the presence of that feeling is not in itself a particularly useful indicator.

SOme Dumb things from CHATGPT

GPT-n cannot reason, and it has no model of the world. It just looks at correlations between how words appear in vast quantities of text from the web, without know how they connect to the world. It doesn’t even know there is a world.

I think it is a surprise to many that it seems as smart as it does given that all it has is these correlations. But perhaps that says something about how we humans relate to language and the world, and that language really only, perhaps, has a tenuous connection to the world in which we live; a scary thought.

There is a veritable cottage industry of individuals showing how LLMs can easily be provoked into showing that they have no capability in spatial reasoning, ordinal numbers, or even small quantities below 10.

Here are examples of troubles with numbers, and other interesting failures, with ChatGPT in the Communications of the ACM from earlier this year. The authors, Gary Marcus and Ernie Davis, are both faculty at NYU. Here is just one of their examples:

Here is an example of complete lack of spatial reasoning from an article by Fletcher Jones Professor of Economics Gary N. Smith of Pomona College:

Human: The table didn’t fit in the car because it was too small. What was too small? 

ChatGPT:  The table was too small to fit in the car.

Human: Would it help to get a smaller car? 

ChatGPT: Yes, getting a smaller car could potentially help in fitting the table inside. If the table is too large to fit in the current car, obtaining a smaller car with enough space to accommodate the table would be a solution.

Here is another example which Marcus and Davis write about, where someone asked ChatGPT to write a program.

It writes a Python program without getting the indentation right, and without actually addressing the JSON description part of the specification, but it sure manages to reproduce the sexism and racism contained in its training set. See the guardrails in GPT-4 below.

And here are three examples from the Marcus and Davis paper above (all due to other authors who they cite), where ChatGPT is happy to make stuff up, because it really doesn’t understand how important many words really are:

Two Simple But Amazing examples

Here are two tweets that random people sent out where ChatGPT seems to be funny and smart, and where it doesn’t have to reason to get there, rather it has to generate plausible text. And these two simple examples show how plausible it can be.

And this one runs up against some guard rails that have been put into the system manually, but which are bust through on the second request.

I think it is easy to see from these two examples that performance/competence confusion is very likely.  It makes no sense that a person who could respond in these ways would be as ritualistically as dumb about numbers and spatial relations as the previous section reveals for ChatGPT.

What does OPEN AI Say about GPT-4?

The opening section of the GPT-4 Tech Report from Open AI is instructive, as it includes this paragraph (my emphasis):

Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [1, 31, 32]: it is not fully reliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learn from experience. Care should be taken when using the outputs of GPT-4, particularly in contexts where reliability is important.

Open AI is being quite clear here that GPT-4 has limitations.  However they appear to be agnostic on who should be taking care.  Is it the responsibility of people who use the GPT-4 parameter set in some product, or is it the end user who is exposed to outputs from that product? Open AI does not express an opinion on this matter.

In a second paper in the .pdf, i.e., the “GPT-4 System Card” they go through the mitigation against dangerous or wrong outputs that they have worked on for the last six months, and give comparisons between what was produced early on at “GPT-4 (early)” and what is produced now at “GPT-4 (launch)”. They have put in a number of guard rails that clearly reduce the amount of both objectionable and dangerous output that can be produced. Nevertheless on page 19 of the System Card (page 57 of the .pdf) they say:

As noted above in 2.2, despite GPT-4’s capabilities, it maintains a tendency to make up facts, to double-down on incorrect information, and to perform tasks incorrectly. Further, it often exhibits these tendencies in ways that are more convincing and believable than earlier GPT models (e.g., due to authoritative tone or to being presented in the context of highly detailed information that is accurate), increasing the risk of overreliance.

This is pretty damning. Don’t rely on outputs from GPT-4.

Earlier in the System Card report (page 7/45):

In particular, our usage policies prohibit the use of our models and products in the contexts of high risk government decision making (e.g, law enforcement, criminal justice, migration and asylum), or for offering legal or health advice.

Here they are protecting themselves by outlawing certain sorts of usage in their license.

This is in the context of their human red team having probed GPT-4 and introduced new training so that often it will refuse to produce harmful text when it matches a class of prompts against which it has been trained.

But their warnings reproduced above  say that they are not at all confident that we will not see real problems with some of the things produced by GPT-4. They have not been able to bullet-proof it with six months of work by a large team.  This is no surprise. There are many many long tail cases to consider and patch up. The same was true for autonomous driving and the result is that we are three to five years on from where executives at major automobile companies were predicting we would have level 4 driving in consumer cars. That experience should be a cautionary tale for GPT-4 and brethren, saying that reliance on them will be fraught for many years to come, unless they are very much boxed in to how they can be used.

On March 21st, 2023, Sundar Pichai, CEO of Google, on the introduction of Bard A.I., Google’s answer to GPT-4, warned his employees, that “things will go wrong”.

Always a person in the loop in successful AI Systems

Many successful applications of AI have a person somewhere in the loop.  Sometimes it is a person behind the scenes that the people using the system do not see, but often it is the user of the system, who provides the glue between the AI system and the real world.

This is true of language translation systems where a person is reading the output and, just as they do with children, the elderly, and foreigners, adapts quickly to the mistakes the person or system makes, and fill in around the edges to get the meaning, not the literal interpretation.

This is true of speech understanding systems where we talk to Alexa or Google Home, or our TV remote, or our car. We talk to each of them slightly differently, as we humans quickly learn how to adapt to their idiosyncracies and the forms they can understand and not understand.

This is true of our search engines, where we have learned how to form good queries that will get us the information we actually want, the quickest.

This is true our smart cameras where we have learned how to take photographs with them rather than with a film camera (though in this case they are often superhuman in their capabilities).

This is true where we are talking to a virtual agent on a web site where we will either be left to a frustrating experience or the website has back up humans who connect in to help with tricky situations.

This is true of driver assist/self driving modes in cars where the human driver must be prepared to take over in an instant in high stress situations.

This is true of mobile robots in hospitals, taking the dirty sheets and dishes to be cleaned, or bringing up prescriptions from the hospital pharmacy,  where there is a remote network operations center that some unseen user is waiting to take over control when the robot gets confused.

This is true of chess where the best players are human chess experts working with a chess engine, and together they play better than any chess engine by itself.

And this is true of art work, produced by stable diffusion models, where the eye of the beholder always belongs to a human.

Below I predict the future for the next few years with GPTs and point out that their successful deployment will always have a person in the loop in some sense.

Predicting the future is hard

Roy Amara, who died on the last day of 2007, was the president of a Palo Alto based think tank, the Institute for the future, and is credited with saying what is now known as Amara’s Law:

We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.

This has been a common problem with Artificial Intelligence, and indeed of all of computing. In particular, since I first became conscious of the possibility of Artificial Intelligence around 1963 (and as an eight year old proceeded to try to build my own physical and intelligent computers, and have been at it ever since), I have seen these overestimates many many times.

A few such instances of AI technologies that have induced gross overestimates of how soon we would get to AGI, in roughly chronological order, that I personally remember include:

John McCarthy’s estimate that the computers of the 1960’s were powerful enough to support AGI, Minsky and Michie and Nilsson each believing that search algorithms were the key to intelligence, neural networks (volume 3, perceptrons) [[I wasn’t around for the first two volumes; McCulloch and Pitts in 1943, Minsky in 1953]], first order logic, resolution theorem proving, MacHack (chess 1), fuzzy logic, STRIPS, knowledge-based systems (and revolutionizing medicine), neural networks (volume 4, back propagation), the primal sketch, self driving cars (Dickmanns, 1987), reinforcement learning (rounds 2 and 3), SOAR, qualitative reasoning, support vector machines, self driving cars (Kanade et al, 1997), Deep Blue (chess 2), self driving cars (Thrun, 2007), Bayesian inference, Watson (Jeopardy, and revolutionizing medicine), neural networks (volume 5, deep learning), Alpha GO, reinforcement learning (round 4), generative images, and now large language models. All have heralded the imminence of human level intelligence in machines. All were hyped up to the limit, but mostly in the days when very few people were even aware of AI, so very few people remember the levels of hype. I’m old. I do remember all these, but have probably forgotten quite a few…

None of these things have lived up to that early hype. As Amara predicted at first they were overrated. But at the same time, almost every one of these things have had long lasting impact on our world, just not in the particular form that people first imagined. As we twirled them around and prodded them, and experimented with them, and failed, and retried, we remade them in ways different from how they were first imagined, and they ended up having bigger longer term impacts, but in ways not first considered.

How does this apply to GPT world?  As always, the hype is overestimating the utility and the threats. However much will come from GPT-like systems.

Do I have it wrong?

Ada Lovelace said something similar to Amara’s Law back in 1843. This is from her first paragraph of “Note G”, in her notes she wrote to accompany a translation she made of someone else’s notes on the Analytical Engine in 1843. With her emphasis:

In considering any new subject, there is frequently a tendency, first, to overrate what we find to be already interesting or remarkable; and, secondly, by a sort of natural reaction, to undervalue the true state of the case, when we do discover that our notions have surpassed those that were really tenable.

Here the first half matches the first half of Amara’s Law. Her second half touches on something different than Amara’s second half. She says that when we get chastened by discovering we were overly optimistic out of the gate we pull back too far on our expectations.

Having seen the hype cycle so often and seen it go a particular way so often, am I now undervaluing the subject of a new hype cycle? If this is hype cycle n, I would have been right to undervalue the hype for the previous n-1 times. Am I just pattern matching and thinking it would be right to undervalue for time n? Am I suffering from cynicism?  And I just a grumpy old guy who thinks he’s seen it all? Perhaps. We’ll have to see with time.

In General, what will Happen?

Back in 2010 Tim O’Reilly tweeted out “If you’re not paying for the product then you’re the product being sold.”, in reference to things like search engines and apps on telephones.

I think that GPTs will give rise to a new aphorism (where the last word might vary over an array of synonymous variations):

If you are interacting with the output of a GPT system and didn’t explicitly decide to use a GPT then you’re the product being hoodwinked.

I am not saying everything about GPTs is bad. I am saying that, especially given the explicit warnings from Open AI, that you need to be aware that you are using an unreliable system.

Using an unreliable system sounds awfully unreliable, but in August 2021 I had a revelation at TED in Monterey, California, when Chris Anderson (the TED Chris), was interviewing Greg Brockman, the Chairman of Open AI about an early version of GPT. He said that he regularly asked it questions about code he wanted to write and it very quickly gave him ideas for libraries to use, and that was enough to get him started on his project. GPT did not need to be fully accurate, just to get him into the right ballpark, much faster than without its help, and then he could take it from there.

Chris Anderson (the 3D robotics one, not the TED one) has likewise opined (as have responders to some of my tweets about GPT) that using ChatGPT will get him the basic outline of a software stack, in a well tread area of capabilities, and he is many many times more productive than with out it.

So there, where a smart person is in the loop, unreliable advice is better than no advice, and the advice comes much more explicitly than from carrying out a conventional search with a search engine.

[[Earlier this year I posted to my facebook friends that I was having trouble converting over a software system that I have been working on for 30+ years from running natively on an x86 Mac to running natively on an M1 ARM Mac. The issue was that my old technique for changing memory that my compiler had just written instructions into as data to then allow it to be executed as instructions was not working.  John Markoff suggested that I ask ChatGPT, which I then did. It gave me a perfect multi-paragraph explanation of how to do it, starting off with “…on an M1 Macintosh…”. The problem was the explanation was completely accurate for an x86 Macintosh, and was exactly what I had been doing for the last 10+ years, but completely wrong for an M1 Macintosh.]]

The opposite of useful can also occur, but again it pays to have a smart human in the loop.  Here is a report from the editor of a science fiction magazine which pays contributors. He says that from late 2022 through February of 2023 the number of submissions to the magazine increased by almost two orders of magnitude, and he was able to determine that the vast majority of them were generated by chatbots. He was the person in the loop filtering out the signal he wanted, human written science fiction, from vast volumes of noise of GPT written science fiction.

Why should he care?  Because GPT is an auto-completer and so it is generating variations on well worked themes.  But, but, but, I hear people screaming at me.  With more work GPTs will be able to generate original stuff.  Yes, but it will be some other sort of engine attached to them which produces that originality.  No matter how big, and how many parameters, GPTs are not going to to do that themselves.

When no person is in the loop to filter, tweak, or manage the flow of information GPTs will be completely bad. That will be good for people who want to manipulate others without having revealed that the vast amount of persuasive evidence they are seeing has all been made up by a GPT.  It will be bad for the people being manipulated.

And it will be bad if you try to connect a robot to GPT. GPTs have no understanding of the words they use, no way to connect those words, those symbols, to the real world. A robot needs to be connected to the real world and its commands need to be coherent with the real world. Classically it is known as the “symbol grounding problem”. GPT+robot is only ungrounded symbols. It would be like you hearing Klingon spoken, without any knowledge other than the Klingon sound stream (even in Star Trek you knew they had human form and it was easy to ground aspects of their world). A GPT telling a robot stuff will be just like the robot hearing Klingonese.

[[And, of course, for those who have read my more obscure writing for the last 30+  years (see Nature (2001), vol 409, page 409), I do have issues with whether the symbol grounding problem is the right way of thinking about things, but for this argument it is good enough.]]

My argument here is that GPTs might be useful, and well enough boxed, when there is an active person in the loop,  but dangerous when the person in the loop doesn’t know they are supposed to be in the loop. [This will be the case for all young children.]  Their intelligence, applied with strong intellect, is a key component of making any GPT be successful.

Specific Predictions

Here I make some predictions for things that will happen with GPT types of systems, and sometimes coupled with stable diffusion image generation. These predictions cover the time between now and 2030. Some of them are about direct uses of GPTs and some are about the second and third order effects they will drive.

  1. After years of Wikipedia being derided as not a referable authority, and not being allowed to be used as a source in serious work, it will become the standard rock solid authority on just about everything. This is because it has built a human powered approach to verifying factual knowledge in a world of high frequency human generated noise.
  2. Any GPT-based application that can be relied upon will have to be super-boxed in, and so the power of its “creativity” will be severely limited.
  3. GPT-based applications that are used for creativity will continue to have horrible edge cases that sometimes rear their ugly heads when least expected, and furthermore, the things that they create will often arguably be stealing the artistic output of unacknowledged humans.
  4. There will be no viable robotics applications that harness the serious power of GPTs in any meaningful way.
  5. It is going to be easier to build from scratch software stacks that look a lot like existing software stacks.
  6. There will be much confusion about whether code infringes on copyright, and so there will be a growth in companies that are used to certify that no unlicensed code appears in software builds.
  7. There will be surprising things built with GPTs, both good and bad, that no-one has yet talked about, or even conceived.
  8. There will be incredible amounts of misinformation deliberately created in campaigns for all sorts of arenas from political to criminal, and reliance on expertise will become more discredited, since the noise will drown out any signal at all.
  9. There will be new categories of pornography.

Predictions Scorecard, 2023 January 01

rodneybrooks.com/predictions-scorecard-2023-january-01/

 

On January 1st, 2018, I made predictions about self driving cars, Artificial Intelligence, machine learning, and robotics, and about progress in the space industry. Those predictions had dates attached to them for 32 years up through January 1st, 2050.

As part of self certifying the seriousness of my predictions I promised to review them, as made on January 1st, 2018, every following January 1st for 32 years, the span of the predictions, to see how accurate they were. This is my fifth annual review and self appraisal, following those of 201920202021, and 2022. I am over a seventh of the way there!  Sometimes I throw in a new side prediction in these review notes.

I made my predictions because at the time, just like now, I saw an immense amount of hype about these three topics, and the general press and public drawing conclusions about all sorts of things they feared (e.g., truck driving jobs about to disappear, all manual labor of humans about to disappear) or desired (e.g., safe roads about to come into existence, a safe haven for humans on Mars about to start developing) being imminent. My predictions, with dates attached to them, were meant to slow down those expectations, and inject some reality into what I saw as irrational exuberance.

I was accused of being a pessimist, but I viewed what I was saying as being a realist. In the last couple of years I have started to think that I too, reacted to all the hype, and was overly optimistic in some of my predictions. My current belief is that things will go, overall, even slower than I thought five years ago. That is not to say that there has not been great progress in all three fields, but it has not been as overwhelmingly inevitable as the tech zeitgeist thought on January 1st, 2018.

UPDATE of 2019’s Explanation of Annotations

As I said in 2018, I am not going to edit my original post, linked above, at all, even though I see there are a few typos still lurking in it. Instead I have copied the three tables of predictions below from 2022’s update post, and have simply added comments to the fourth columns of the three tables. I also highlight dates in column two where the time they refer to has arrived.

I tag each comment in the fourth column with a Cyan (#00ffff) colored date tag in the form yyyymmdd such as 20190603 for June 3rd, 2019. As in 2022 I have highlighted the new text put in for the current year in LemonChiffon (#fffacd) so that it is easy to pick out this year’s updates. There are 15 such updates this year in the tables below.

The entries that I put in the second column of each table, titled “Date” in each case, back on January 1st of 2018, have the following forms:

NIML meaning “Not In My Lifetime, i.e., not until beyond December 31st, 2049, the last day of the first half of the 21st century.

NET some date, meaning “No Earlier Than” that date.

BY some date, meaning “By” that date.

Sometimes I gave both a NET and a BY for a single prediction, establishing a window in which I believe it will happen.

For now I am coloring those statements when it can be determined already whether I was correct or not.

I have started using LawnGreen (#7cfc00) for those predictions which were entirely accurate. For instance a BY 2018 can be colored green if the predicted thing did happen in 2018, as can a NET 2019 if it did not happen in 2018 or earlier. There are 14 predictions now colored green, including 4 new ones this year.

I will color dates Tomato (#ff6347) if I was too pessimistic about them. There is one Tomato, with no new ones this year. If something happens that I said NIML, for instance, then it would go Tomato, or if in 2020 something already had happened that I said NET 2021, then that too would have gone Tomato.

If I was too optimistic about something, e.g., if I had said BY 2018, and it hadn’t yet happened, then I would color it DeepSkyBlue (#00bfff). The first of these appeared this year. And eventually if there are NETs that went green, but years later have still not come to pass I may start coloring them LightSkyBlue (#87cefa). I did that below for one prediction in self driving cars last year.

In summary then: Green splashes mean I got things exactly right. Red means provably wrong and that I was too pessimistic. And blueness will mean that I was overly optimistic.

Self Driving Cars

ot the predictions were made and the years in blue were the originally predicted times when the capability or deployment would happen. (The orange arrows indicate that I later found revised dates from the same sources.) When a blue year passes without the prediction having been fulfilled I color it pinkish. No new pink this year, but as before not a single one of these predictions has come to fruition, so far not ever, not just that they have happened, but later. These predictions were part of what made me make my predictions back in 2018 to temper them with reality.

At the end of 2024 and 2025, the next years that show up in blue here, I fully expect to be able to color them pink.

But, I am no longer alone.  In the last year there has been a significant shift, and belief in self driving cars being imminent and/or common has started to have its wheels fall off.

Bloomberg complained that despite one hundred billion dollars having been spent on their development self driving care are going nowhere.

More importantly various efforts are being shut down, despite billions of dollars of sunk cost.  After spending $3.6B Ford and VW shut down their joint venture, Argo, as the story says, a self-driving road to nowhere. As this story says, Ford thought they could bring the technology to market in 2021 (as reported in the graphic above), but they think maintaining 2,000 employees in search of this opportunity is not the best way to serve their customers.

Many other car companies have pulled back from saying they are working on driverless software, or Level 4.  Toyota and Mercedes are two such companies, where they expect a driver to be in the loop, and are not trying to remove them.

Meanwhile the heat is on Tesla for naming their $15,000 software Full Self Driving (FSD), and the state of California is fighting them in court.

See this very careful review of Telsa FSD driving in a Jacksonville, FL, neighborhood, with the car’s owner, airline pilot Chuck Cook. He is a true fan of Tesla and FSD, but tests it rigorously, so much so that the CEO of Tesla has sent engineers to work with him and tune up FSD for his neighborhood. As this review shows it is still not reliable in many circumstances. If it is not reliable sometimes then the human has to be paying attention at all times and so it is a long way from actually being FSD. People will make excuses for it, but that won’t work with normal people, like me. Paying $15,000 for software which doesn’t do its primary job, namely giving the human full confidence, is not a winning business strategy once you get out of the techy bubble.

Oh, and by the way, the  “one million robo-taxis” on the road by 2020, promised by the CEO of Tesla in early 2019 have still not panned out. I believe the actual number is still solidly zero.

Other parts of the great self driving experiment have also proved to be too expensive. This year Amazon shut down its last mile autonomous delivery service project, where the idea was that driverless vehicles would get their packages directly to houses.  Too hard for now.

But I Did Take Self-Driving Taxi Services in 2022

Back in May, on this blog, I reported my experience taking a truly driverless taxi service three times in one evening in San Francisco. It was with Cruise (owned my GM) who now charge money for this service that runs on good weather nights from around 10:30pm to 5:00am.  These are the hours of least traffic in San Francisco.  It does not operate in all parts of the city, especially not the crowded congested parts.

Here is the brief summary of my experience.. The first ride was to be about 20 blocks due south of my house. The Cruise vehicle did not come to my house but told me to go to another street at the end of my block. My block is quite difficult to drive on for a number of reasons and I never see any of Cruise, Waymo, or Zoox vehicles collecting data on it, though they can all be seen just a block away.

Rather than heading due south the Cruise vehicle diverted 10 blocks west and back, most probably to avoid areas with traffic. The few times it was in traffic it reminded me of when I taught my kids to drive with heavy unnecessary breaking when other cars were nearby. It also avoid unprotected left turns and preferred to loop around through a series of three right turns.

On my third ride it elected to pick me up on the other side of the street than where I had requested it, right in front of construction that forced me to walk out into active lanes in order to get in. No human taxi driver would have ignored my waves and gestures to be picked up at a safe spot thirty feet further along the road.

My complete report includes many other details. Functionally it worked, but it did so by driving slowly and avoiding any signs of congestion. The result is that it was slower by a factor of two over any human operated ride hailing service. That might work for select geographies, but it is not going to compete with human operated systems for quite a while.

That said, I must congratulate Cruise on getting it so far along. This is much more impressive and real than any other company’s attempts at deployment. But as far as I can tell it is only about 32 vehicles, and Cruise is losing $5M per day.  This is decades away from profitability.

Here is part of a comment on my blog post by a Glenn Mercer, who I don’t think that I know (apologies Glenn, if I do or should!!) [Chorizo is the name of a car that took me on two of my three rides.]:

Your experience seemed similar: your rides seemed to “try” (sorry about imputing agency here) to be safe by avoiding tricky situations. It would be underwhelming that if all AVs delivered to us was confirmation of the already-known fact that we can cut fatalities in the USA by obeying traffic laws, driving cautiously, not drinking, etc. Of course ANY fatality avoided is a good thing… but it is a little underwhelming (from the perspective of a fan of all things sci-fi) that we may avoid them not because some sentient AI detected via satellite imagery a hurtling SUV 3 blocks away and calculated its trajectory in real time to induce in Chorizo a spectacular life-saving 3-point turn… Rather, that just making sure Chorizo stopped at the stop sign and “looked both ways” may be all that we needed. Ah, well.

I think this is a great comment. It fits well with my prediction that no self driving car is going to be faced with an instance of the trolley problem until beyond 2050. My advice on the so called “trolley problem” is to just stomp on the damn brakes. That fits with Glenn’s recommendation above to just be consistently careful, and you’ll do a whole lot better than most humans.  This may well be achievable.

Prediction
[Self Driving Cars]
Date2018 CommentsUpdates
A flying car can be purchased by any US resident if they have enough money.NET 2036There is a real possibility that this will not happen at all by 2050.
20230101
There is currently frothy hype about coming flying electric taxi services. See the main text for why I think this hype is overblown. And note that these eVTOL taxis are not at all what people used to mean (just five years ago) when they said "flying cars".
Flying cars reach 0.01% of US total cars.NET 2042That would be about 26,000 flying cars given today's total.
Flying cars reach 0.1% of US total cars.NIML
First dedicated lane where only cars in truly driverless mode are allowed on a public freeway.
NET 2021
This is a bit like current day HOV lanes. My bet is the left most lane on 101 between SF and Silicon Valley (currently largely the domain of speeding Teslas in any case). People will have to have their hands on the wheel until the car is in the dedicated lane.20210101 It didn't happen any earlier than 2021, so I was technically correct. But I really thought this was the path to getting autonomous cars on our freeways safely. No one seems to be working on this...
20220101
Perhaps I was projecting my solution to how to get self driving cars to happen sooner than the one for one replacement approach that the Autonomous Vehicle companies have been taking. The left lanes of 101 are being rebuilt at this moment, but only as a toll lane--no special assistance for AVs. I've turned the color on this one to "too optimistic" on my part.
Such a dedicated lane where the cars communicate and drive with reduced spacing at higher speed than people are allowed to driveNET 2024
First driverless "taxi" service in a major US city, with dedicated pick up and drop off points, and restrictions on weather and time of day.
NET 2021
The pick up and drop off points will not be parking spots, but like bus stops they will be marked and restricted for that purpose only.20190101 Although a few such services have been announced every one of them operates with human safety drivers on board. And some operate on a fixed route and so do not count as a "taxi" service--they are shuttle buses. And those that are "taxi" services only let a very small number of carefully pre-approved people use them. We'll have more to argue about when any of these services do truly go driverless. That means no human driver in the vehicle, or even operating it remotely.
20200101
During 2019 Waymo started operating a 'taxi service' in Chandler, Arizona, with no human driver in the vehicles. While this is a big step forward see comments below for why this is not yet a driverless taxi service.
20210101 It wasn't true last year, despite the headlines, and it is still not true. No, not, no.
20220101
It still didn't happen in any meaningful way, even in Chandler. So I can call this prediction as correct, though I now think it will turn out to have been wildly optimistic on my part.
20230101
There was movement here, in that Cruise started a service in San Francisco, for a few hours per night, with about 32 vehicles. I rode it before it was charging money, but it now is doing so. Cruise is still losing $5M per day, however. See the main text for details.
Such "taxi" services where the cars are also used with drivers at other times and with extended geography, in 10 major US citiesNET 2025A key predictor here is when the sensors get cheap enough that using the car with a driver and not using those sensors still makes economic sense.
Such "taxi" service as above in 50 of the 100 biggest US cities.NET 2028It will be a very slow start and roll out. The designated pick up and drop off points may be used by multiple vendors, with communication between them in order to schedule cars in and out.
Dedicated driverless package delivery vehicles in very restricted geographies of a major US city.
NET 2023
The geographies will have to be where the roads are wide enough for other drivers to get around stopped vehicles.
20220101
There are no vehicles delivering packages anywhere. There are some food robots on campuses, but nothing close to delivering packages on city streets. I'm not seeing any signs that this will happen in 2022.
20230101
It didn't happen in 2022, so I can call it. It looks unlikely to happen any time soon as some major players working in this area, e.g., Amazon, abandoned their attempts to develop this capability (see the main text). It is much harder than people thought, and large scale good faith attempts to do it have shown that.
A (profitable) parking garage where certain brands of cars can be left and picked up at the entrance and they will go park themselves in a human free environment.
NET 2023
The economic incentive is much higher parking density, and it will require communication between the cars and the garage infrastructure.20220101
There has not been any visible progress towards this that I can see, so I think my prediction is pretty safe. Again I was perhaps projecting my own thoughts on how to get to anything profitable in the AV space in a reasonable amount of time.
20230101
It didn't happen in 2022, so I can call this prediction as correct. I think Tesla's driving system is probably good enough for their cars to do this, but no one seems to be going after this small change application.
A driverless "taxi" service in a major US city with arbitrary pick and drop off locations, even in a restricted geographical area.
NET 2032This is what Uber, Lyft, and conventional taxi services can do today.
Driverless taxi services operating on all streets in Cambridgeport, MA, and Greenwich Village, NY. NET 2035Unless parking and human drivers are banned from those areas before then.
A major city bans parking and cars with drivers from a non-trivial portion of a city so that driverless cars have free reign in that area.NET 2027
BY 2031
This will be the starting point for a turning of the tide towards driverless cars.
The majority of US cities have the majority of their downtown under such rules.NET 2045
Electric cars hit 30% of US car sales.NET 202720230101
I made this prediction five years ago today. If it is accurate we'll know five years from today. When I made the prediction many people on social media said I was way too pessimistic. Looking at the current numbers (see the main text) I think this level of sales could be plausibly reached in one of 2026, 2027, or 2028. Any small perturbance will knock 2026 out of contention. In the worst (for me) case perhaps I was pessimistic in predicting ten years rather than nine for this to happen.
Electric car sales in the US make up essentially 100% of the sales.NET 2038
Individually owned cars can go underground onto a pallet and be whisked underground to another location in a city at more than 100mph.NIMLThere might be some small demonstration projects, but they will be just that, not real, viable mass market services.
First time that a car equipped with some version of a solution for the trolley problem is involved in an accident where it is practically invoked.NIMLRecall that a variation of this was a key plot aspect in the movie "I, Robot", where a robot had rescued the Will Smith character after a car accident at the expense of letting a young girl die.

Other Car Predictions — Electric Cars

In my 2018 predictions I said that electric cars would not make up 30% of US automobile sales until 2027 (and note that the 2027 sales level can only be known  sometime in 2028, ten years after my prediction).

This prediction was at odds with the froth about Tesla, and indeed its stock price over the last few years, until Q4 2022, when things changed dramatically.  I wondered whether I had been too optimistic.

However, there was a big uptick in electric car sales in the US in Q3 2022, as can be seen by comparing the Blue Book numbers on sales of all vehicles with those of electric vehicles. In Q3 electric vehicle sales were 205,682 (a year on year increase of 68%) compared to a total of 3,418,718 for all vehicle sales (a mere 0.1% year on year increase), making electric sales 6% of all US sales. Clearly something is going on.

If electric sales continue to grow at 68% per year then 2025 would hit 28% of new cars in the US being electric. Even at 50% year over year growth we would get to 30% in 2026, a year earlier than I had predicted.  But those are both big ifs, and one quarter of big growth does not a make a trend. We need to see this sustained for a while.

Meanwhile there are forces ready to slow down electric car adoption, including dropping gas prices. Or perhaps if some of the automobile battery plants are delayed for any of a hundred reasons. If everything goes right I think there is a chance of meeting 30% electric vehicles in the US sometime in the 2026 to 2028 period.

Not everyone agrees however, including the leader of the biggest car company in the world. Just this last week Akio Toyoda, CEO of Toyota, has been quoted trying to temper expectations: “Just like the fully autonomous cars that we were all supposed to be driving by now, I think BEVs are just going to take longer to become mainstream than the media would like us to believe.” (Note that he also casually mentions that self driving cars are much harder than everyone thought. Also he calls them Battery Electric Vehicles as Toyota also produces a fuel cell based electric vehicle that consumes hydrogen.)

So…electric car adoption is rising, but the jury is still out on whether we will get to 30% US market penetration by 2027.

Other Car Predictions — Flying Cars

Way back when I made my predictions in 2018 “flying cars” meant cars that could drive on regular roads and then take to the air and fly somewhere. Over the last few years the meaning of “flying cars” has drifted to things that are not cars at all, but rather very light electric vertical take off and landing (eVTOL) flying machines that will be flying taxis whisking people over the crowded freeways to their destinations. A utopian transport structure.

Needless to say there has been froth galore on just how big this industry is going to be and just how soon it is going to happen, and just how big a revolution it will be.

Yeah.

Recently I did an analysis, published on this blog, of a McKinsey report saying all these things.  I allowed that perhaps they were 100 times too optimistic (what is a factor of 100 too big as a market analysis between friends, after all), and if we allowed for 400% growth per year to meet that 100th sized market then today there must be seven flights per day of eVTOLs, following a commercial profile and with people onboard. So… there should be videos of such flights, somewhere, right?

No, not anywhere.  There are some uncrewed flights, and there are some just a few tens of meters high over water,  but none over populated areas and none at  the hundreds of meters that will need to be maintained. And the videos on the websites of companies that have taken hundreds of millions of dollars in VC funding show only two meter high crewed hops.  And perhaps more tellingly, later in 2022, Larry Page pulled the plug on Kitty Hawk, a company he had been funding for 12 years to develop such capabilities.

One way or another, at scale eVTOL taxis are not happening soon.

Robotics, AI, and Machine Learning

British science fiction and science writer Arthur C. Clarke formulated three adages that have come to be known as Clarke’s three laws. The third of these is:

Any sufficiently advanced technology is indistinguishable from magic.

As I said in my post seven deadly sins of predicting the future of AI.

This is a problem we all have with imagined future technology. If it is far enough away from the technology we have and understand today, then we do not know its limitations. It becomes indistinguishable from magic.

When a technology passes that magic line anything one says about it is no longer falsifiable, because it is magic.

When technology is sufficiently different from what we have experienced to date, we humans can’t make good guesses about its limitations. And so we start imbuing it with magical powers, believing that it is more powerful than anything we have previously imagined, so it becomes at once both incredibly powerful and incredibly dangerous.

A more detailed problem is what I call the performance/competence confusion, also from that seven deadly sins blog post.

We humans have a good mapping for people in understanding what a particular performance by a person in some arena implies about the general competence of that person in that arena. For instance, if a person is able to carry on a conversation about a radiological images that they are looking it is a safe bet that that person would be able to look out of the window and tell you what the weather is out there. And even more, tell you the sorts of weather that one typically sees over the period of a year in that same location (e.g., “no snow here, but lots of rain in February and March”).

The same can not be said for our AI systems. Performance on some particular task in no indicator of any human like general competence in related areas.

In recent years we saw this with Deep Learning (and Reinforcement Learning for that matter, though the details of the Alpha game players were much more complex than most people understood). DL was supposed to make both radiologists and truck drivers redundant in just a few years. Instead we have a shortage of people in each of these occupations. On the other hand DL has given both those professions valuable tools.

This past year has seen Dall-E 2, the latest image from natural language system, and ChatGPT, the latest large language model. When people see cherry picked examples from these they think that general machine intelligence, with both promise and danger, is just around the corner.

There is a veritable cottage industry on social media with two sides; one gushes over virtuoso performances of these systems, perhaps cherry picked, and the other shows how incompetent they are at very simple things, again cherry picked.

The problem is that as a user you don’t know in advance what you are going to get. So humans still need their hands on the information space steering wheel at all times in order not to end up with a total wreck of an outcome in any applications built on these systems.

Vice Admiral Joe Dyer, former chief test pilot of the US Navy, once reminded me that nothing is ever as good as it first seems, nor as bad. That is an incredibly helpful adage.

Calm down people. We neither have super powerful AI around the corner, nor the end of the world caused by AI about to come down upon us.

And that pretty much sums up where AI and Machine Learning have gone this year. Lots of froth and not much actual deployed new hard core reliable technology.

Prediction
[AI and ML]
Date2018 CommentsUpdates
Academic rumblings about the limits of Deep Learning
BY 2017
Oh, this is already happening... the pace will pick up.20190101 There were plenty of papers published on limits of Deep Learning. I've provided links to some right below this table. 20200101
Go back to last year's update to see them.
The technical press starts reporting about limits of Deep Learning, and limits of reinforcement learning of game play.
BY 2018
20190101 Likewise some technical press stories are linked below. 20200101
Go back to last year's update to see them.
The popular press starts having stories that the era of Deep Learning is over.
BY 2020
20200101 We are seeing more and more opinion pieces by non-reporters saying this, but still not quite at the tipping point where reporters come at and say it. Axios and WIRED are getting close.
20210101 While hype remains the major topic of AI stories in the popular press, some outlets, such as The Economist (see after the table) have come to terms with DL having been oversold. So we are there.
VCs figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".
NET 2021
I am being a little cynical here, and of course there will be no way to know when things change exactly.20210101 This is the first place where I am admitting that I was too pessimistic. I wrote this prediction when I was frustrated with VCs and let that frustration get the better of me. That was stupid of me. Many VCs figured out the hype and are focusing on fundamentals. That is good for the field, and the world!
Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.
NET 2023
BY 2027
Whatever this turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.20210101 So far I don't see any real candidates for this, but that is OK. It may take a while. What we are seeing is new understanding of capabilities missing from the current most popular parts of AI. They include "common sense" and "attention". Progress on these will probably come from new techniques, and perhaps one of those techniques will turn out to be the new "big thing" in AI.
20220101
There are two or three candidates bubbling up, but all coming out of the now tradition of deep learning. Still no completely new "next big thing".
20230101
Lots of people seem to be converging on "neuro-symbolic", as it addresses things missing in large language models.
The press, and researchers, generally mature beyond the so-called "Turing Test" and Asimov's three laws as valid measures of progress in AI and ML.
NET 2022
I wish, I really wish.20220101
I think we are right on the cusp of this happening. The serious tech press has run stories in 2021 about the need to update, but both the Turing Test and Asimov's Laws still show up in the popular press. 2022 will be the switchover year. [Am I guilty of confirmation bias in my analysis of whether it is just about to happen?]
20230101
The Turing Test was missing from all the breathless press coverage of ChatGPT and friends in 2022. Their performance, though not consistent, pushes way past the old comparisons.
Dexterous robot hands generally available.NET 2030
BY 2040 (I hope!)
Despite some impressive lab demonstrations we have not actually seen any improvement in widely deployed robotic hands or end effectors in the last 40 years.
A robot that can navigate around just about any US home, with its steps, its clutter, its narrow pathways between furniture, etc.Lab demo: NET 2026
Expensive product: NET 2030
Affordable product: NET 2035
What is easy for humans is still very, very hard for robots.20220101
There was some impressive progress in this direction this year with the Amazon's release of Astro. A necessary step towards these much harder goals. See the main text.
20230101
Astro does not seem to have taken off, and there are no new demos of it for public consumption. We may have to wait for someone else to pick up the mantle.
A robot that can provide physical assistance to the elderly over multiple tasks (e.g., getting into and out of bed, washing, using the toilet, etc.) rather than just a point solution.NET 2028There may be point solution robots before that. But soon the houses of the elderly will be cluttered with too many robots.
A robot that can carry out the last 10 yards of delivery, getting from a vehicle into a house and putting the package inside the front door.Lab demo: NET 2025
Deployed systems: NET 2028
A conversational agent that both carries long term context, and does not easily fall into recognizable and repeated patterns.
Lab demo: NET 2023
Deployed systems: 2025
Deployment platforms already exist (e.g., Google Home and Amazon Echo) so it will be a fast track from lab demo to wide spread deployment.20230101
Despite the performance of ChatGPT and other natural language transformer models, no one seems to have connected them to spoken language systems, nor have their text versions demonstrated any understanding of interactional context over periods of time with individuals. Perhaps people have tried but the shortcomings of the systems dominate in this mode.
An AI system with an ongoing existence (no day is the repeat of another day as it currently is for all AI systems) at the level of a mouse.NET 2030I will need a whole new blog post to explain this...
A robot that seems as intelligent, as attentive, and as faithful, as a dog.NET 2048This is so much harder than most people imagine it to be--many think we are already there; I say we are not at all there.
A robot that has any real idea about its own existence, or the existence of humans in the way that a six year old understands humans.NIML

The Next Big Thing in AI

Both Dall-E 2 and ChatGPT have a lot more mechanism in them than just plan back propagation deep neural networks. To make Robotics or AI systems do stuff we always need active mechanism besides any sort of learning algorithm.

In the old days of AI we built active mechanisms with symbols, using techniques known as Symbolic Artificial Intelligence. Symbols are atomic computational objects which stand in place of something out in the physical world.

Over the last year there has been a new term making the rounds, “neuro-symbolic” or “neurosymbolic”, as a way of merging the deep learning revolution with symbolic artificial intelligence. Although everyone is very protective of their unique angle on this merger it seems that lots of people from all over the field of Artificial Intelligence believe there may be something there. One might even classify the two media darlings above as instances of neurosymbolic AI.

In Q3 of 2022 a new journal Neurosymbolic Artificial Intelligence showed up. It has an editorial board with representation from first rate institutions across the globe.

Note that the wikipedia page on symbolic AI, referenced above, was edited in August 2022 to include the term “neuro-symbolic” for the first time. Whether neuro-symbolic becomes a real dominant approach is still up in the air. But it has a whiff of possibility about it.

[[I don’t happen to think that neuro-symbolic will past the test of time, but it will lead to more short term progress. I’ll write a longer post on why I think it still misses out on being a solid foundation for artificially intelligent machines over the next few centuries.]]

Space

The biggest story of the last year is the lack of any further flights of SpaceX’s Starship. Late in 2021 the CEO was talking about getting to a flight every two weeks by the end of 2022.  But none, nada, zippo. Normally that might be considered an unbelievable difference in what was promised and what was delivered. But these days CEOs of car companies, social media companies, and yes space transportation companies, have been known to repeatedly make outrageous promises that have nothing to do with reality.

In the case of SpaceX this may be problematic, as the US return to the Moon is premised upon SpaceX delivering. If the government can’t believe the CEO’s promises that is going to make policy decisions and letting contracts very difficult.

NASA has had some success (un-crewed Artemis) towards lunar goals, and Blue Origin has finally delivered engines for a large booster built by others.

SpaceX itself has had a banner year with its Falcon 9 booster with 60 successful launches (plus one Falcon Heavy launch); probably the last time we had a launch rate that large was the V2 in the final year of the Second World War (though the V2 rate was about 50 times higher and its success rate was much lower than that of SpaceX).

Space tourism had a great 2021 but not such a big year in 2022.  It is still not a large scale business and likely won’t be for a while, if ever.

Summary: conventional stuff is going great, new stuff is hard and only limping along.

Space Tourism Sub-Orbital

In July 2020 Virgin Galactic had their first suborbital flight which took non-flight crew along, including Richard Branson the founder. It was expected that this would mark the start of their commercial operations but there have been zero further flights of any sort since then with many reasons for the delays.

In 2022 Blue Origin had three flights with people aboard for a total of 18 people. At most 14 of these paid for their tickets while the others were an employee of Blue Origin, or had been selected by other organizations, so they essentially won their seats.

All six crewed Blue Origin flights to date have shared the same hardware including New Shepard 4 as the booster. In September 2022, a month after the sixth crewed flight, an older booster, New Shepard 3, failed during ascent and the flight was aborted and the un-crewed capsule landed safely. There were no further crewed flights for the rest of the year.

In summary crewed sub-orbital flights were down from 2021, and there has been no measurable uptick in paying passengers. These flights are not really taking off at scale.

Space Tourism Orbital

There had been eight non-governmental paid orbital flights up through 2009, all on Russia’s Soyuz. Then none until the last four months of 2021 when there were three; two on Soyuz to the International Space Station and one in a Crew Dragon on a SpaceX Falcon 9 that was self sustaining with no docking in space. A total of eight non-professionals flew to space on those three missions.

This sudden burst of activity made it look like perhaps things might really be taking off, so to speak.  But in 2022 there was only one paid orbital flight, a SpaceX Dragon to the International Space Station, carrying three non-professionals and one former NASA Astronaut working for the company that organized the flight and purchased it from SpaceX.

The space tourism business is still at the sputtering initial steps phase.

SpaceX Falcon 9

SpaceX had a spectacular year with its Falcon 9 having 60 successful launches, with no launch failures and all boosters that were intended to return to Earth successfully doing so. It also had its first Falcon Heavy launch since 2019. As with all four Falcon Heavy launches this was successful and the two side boosters landed back at the launch site as expected.

Prediction
[Space]
Date2018 CommentsUpdates
Next launch of people (test pilots/engineers) on a sub-orbital flight by a private company.
BY 2018
20190101 Virgin Galactic did this on December 13, 2018.
20200101 On February 22, 2019, Virgin Galactic had their second flight, this time with three humans on board, to space of their current vehicle. As far as I can tell that is the only sub-orbital flight of humans in 2019. Blue Origin's new Shepard flew three times in 2019, but with no people aboard as in all its flights so far.
20210101 There were no manned suborbital flights in 2020.
A few handfuls of customers, paying for those flights.
NET 2020
20210101 Things will have to speed up if this is going to happen even in 2021. I may have been too optimistic.
20220101
It looks like six people paid in 2021 so still not a few handfuls. Plausible that it happens in 2022.
20230101
There were three such flights in 2022, with perhaps 14 of the 18 passengers paying. Not quite there yet.
A regular sub weekly cadence of such flights.
NET 2022
BY 2026
20220101
Given that 2021 only saw four such flights, it is unlikely that this will be achieved in 2022.
20230101
Only three flights total in 2022. A long way to go to get to sub weekly flights.
Regular paying customer orbital flights.NET 2027Russia offered paid flights to the ISS, but there were only 8 such flights (7 different tourists). They are now suspended indefinitely. 20220101
We went from zero paid orbital flights since 2009 to three in the last four months of 2021, so definitely an uptick in activity.
20230101
Not so quick! Only one paid flight in 2022.
Next launch of people into orbit on a US booster.
NET 2019
BY 2021
BY 2022 (2 different companies)

Current schedule says 2018.20190101 It didn't happen in 2018. Now both SpaceX and Boeing say they will do it in 2019.
20200101 Both Boeing and SpaceX had major failures with their systems during 2019, though no humans were aboard in either case. So this goal was not achieved in 2019. Both companies are optimistic of getting it done in 2020, as they were for 2019. I'm sure it will happen eventually for both companies. 20200530 SpaceX did it in 2020, so the first company got there within my window, but two years later than they predicted. There is a real risk that Boeing will not make it in 2021, but I think there is still a strong chance that they will by 2022.
20220101
Boeing had another big failure in 2021 and now 2022 is looking unlikely.
20230101
Boeing made up some ground in 2022, but my 2022 prediction was too optimistic.
Two paying customers go on a loop around the Moon, launch on Falcon Heavy.
NET 2020
The most recent prediction has been 4th quarter 2018. That is not going to happen.20190101 I'm calling this one now as SpaceX has revised their plans from a Falcon Heavy to their still developing BFR (or whatever it gets called), and predict 2023. I.e., it has slipped 5 years in the last year.
20220101
With Starship not yet having launched a first stage 2023 is starting to look unlikely, as one would expect the paying customer (Yusaku Maezawa, who just went to the ISS on a Soyuz last month) would want to see a successful re-entry from a Moon return before going himself. That is a lot of test program to get to there from here in under two years.
20230101
With zero 2022 launch activity of Starship this means it couldn't possibly happen until 2024.
Land cargo on Mars for humans to use at a later date
NET 2026SpaceX has said by 2022. I think 2026 is optimistic but it might be pushed to happen as a statement that it can be done, rather than for an pressing practical reason.20230101
The CEO of SpaceX has a pattern of over-optimistic time frame predictions . This did not happen in 2022 as he predicted. I'm now thinking that my 2026 prediction is way over optimistic, as the only current plan is to use Starship to achieve this.
Humans on Mars make use of cargo previously landed there.NET 2032Sorry, it is just going to take longer than every one expects.
First "permanent" human colony on Mars.NET 2036It will be magical for the human race if this happens by then. It will truly inspire us all.
Point to point transport on Earth in an hour or so (using a BF rocket).NIMLThis will not happen without some major new breakthrough of which we currently have no inkling.
Regular service of Hyperloop between two cities.NIMLI can't help but be reminded of when Chuck Yeager described the Mercury program as "Spam in a can".

Boeing’s Woes

The Boeing Starliner is a capsule to be used by NASA on a similar commercial basis that NASA uses Crew Dragon from SpaceX. Originally the development of the two systems was neck and neck, but both were delayed beyond original expectations and Boeing’s Starliner fell behind Crew Dragon.

The first un-crewed flight was in December 2019 and there were many serious problems revealed. Boeing decided that it needed to re-fly that mission before putting NASA astronauts aboard and there were repeated delays with the second flight finally happening in May of 2022, where it autonomously docked with the International Space Station and then safely returned to a splashdown on Earth.

The first crewed flight is currently scheduled for April 2023, many years later than the original expectation.

Artemis

NASA successfully flew its Artemis 1 mission, on NASA’s new Space Launch System, and the Orion Capsule, in November and December of 2022.  It flew to the Moon and and went into a very elliptical orbit, finally returned to a safe splash down on Earth after 25 days.

The Artemis 2 mission will take astronauts to lunar orbit in May 2024, to be followed by Artemis 3 landing two astronauts on the Moon some time in 2025. There are a total of ten crewed Artemis missions currently planned, out to 2034, with missions 3 through 11 all landing astronauts on the Moon, sometimes for many months at a time. All astronauts will leave Earth on NASA’s Space Launch System. Lunar landings for missions 3 and 4 are currently scheduled to use SpaceX’s Starship. See below.  SpaceX Falcon Heavy (triple Falcon 9 boosters) will take hardware to Lunar orbit for Gateway, a small space station.

Starship

Starship is SpaceX’s intended workhorse to replace both Falcon 9 and Falcon Heavy.

It is two stages, both of which are intended to return to Earth. The second stage had many test flights up through early 2021, and after a number of prototypes being destroyed in explosions one successfully landed back at its launch site in Texas. None of these flights left the atmosphere so the heat shields have not yet been tested.

The first stage is massive, with 37 Raptor engines that are a generation later than the Merlins used on Falcon 9. So far it has not flown, nor has it successfully lit as many as 30 of its engines in a ground test.

Over US Thanksgiving in 2021 the CEO of SpaceX urged his workers to abandon their families and come in to work to boost the production rate of these engines. In his email he said:

What it comes down to is that we face genuine risk of bankruptcy if we cannot achieve a Starship flight rate of at least once every two weeks next year.

“Next year” would be 2022.  There have been zero test flights of either the second stage or the first stage and it is the end of 2022. The first stage has never flown at all. Not even powered up all its engines. The first stage is much more complex than any previous booster ever built. It is not flying once every two weeks. Does that mean SpaceX is in danger of bankruptcy? Or was the SpaceX CEO using hyperbole to extract work from his employees?

Testing of the Raptor engines continues at the McGregor launch site and sometimes they blow up (most recently on Dec 21st, 2022)–they may well be being tested to destruction when that happens.

I am concerned that Starship will not be ready for the proposed 2025 Lunar Landing of astronauts and that will really delay humankind’s return the to Moon.

Blue Origin’s Orbital Class Rockets

Another competitor for landing astronauts on the Moon has been Blue Origin. While they have demonstrated suborbital flights, with vertical landings, they have not yet demonstrated heavy lift capability. Their New Glenn booster has had many delays and now seems to be scheduled for a first launch late in 2023.

Blue Origin has developed completely new engines, the BE-4 for the first stage which will be powered by seven of them.  In October of 2022 Blue Origin delivered two of these engines to a customer, ULA, for the new generation Vulcan Centaur rocket. It is expected to launch with them early in 2023. This rocket has solid boosters as well as the BE-4 engine so uses less BE-4’s than New Glenn.

In any case, the delivery of these engines to a paying customer seems like a positive development for the eventual flight of New Glenn.

Generic Predictions (Jan 01, 2023)

Here are some new predictions, not really as detailed as the ones from 2018, but they are answers to much of the hype that is out there. I have not given time lines as these are largely “not going to happen the way the zeitgeist suggests”.

  1. The metaverse ain’t going anywhere, despite the tens of billions of dollars poured in. If anything like the metaverse succeeds it will from a new small player, a small team, that is not yoked down by an existing behemoth.
  2. Crypto, as in all the currencies out there now, are going to fade away and lose their remaining value. Crypto may rise again but it needs a new set of algorithms and capability for scaling. The most likely path is that existing national currencies will morph into crypto currency as contactless payment become common in more and more countries. It may lead to one of the existing national currencies becoming much more accessible world wide.
  3. No car company is going to produce a humanoid robot that will change manufacturing at all. Dexterity is a long way off, and innovations in manufacturing will take very different functional and process forms, perhaps hardly seeming at all like a robot from popular imagination.
  4. Large language models may find a niche, but they are not the foundation for generally intelligent systems.  Their novelty will wear off as people try to build real scalable systems with them and find it very difficult to deliver on the hype.
  5. There will be human drivers on our roads for decades to come.
And One Last BITE

I know, people are disappointed that I didn’t say anything in this post about ChatGPT more substantial than “calm down”.  But really I don’t think there is a lot worth saying.  People are making the same mistake that they have made again and again and again, completely misjudging some new AI demo as the sign that everything in the world has changed.

It hasn’t.

In essence ChatGPT is picking the most likely word to follow all those that precede it, whether it is the words in a human input prompt, or what it has already generated itself. The best analogy that I see for it is that ChatGPT is a dream generator, the dreams that you have at night when you are asleep.

Sometimes from your dreams you can pick out an insight about yourself or the world around you. Or sometimes your dreams make no sense at all, and sometimes they can be scary nightmares. Basing your life, unquestioned,  on what you dreamed last night, and every night, is not a winning strategy.

 

Where are the crewed eVTOL videos?

rodneybrooks.com/where-are-the-crewed-evtol-videos/

I see lots of hype about eVTOLs being on the verge of being big, but I can’t find any videos of prototypes doing what the hype says will be common place by 2025.

Here is a story from November 1st, 2022, about a $352M series E financing for air taxi company Volocoptor. More about Volocoptor below. The story claims that there will be $1.5B (yes, billion) in revenue for the global air taxi market in 2025, rising to over $150B by 2035.

By ‘air taxis’ everyone seems to mean autonomous small flying machines, most likely electric powered vertical take off and landing flying machines, that can be used for short trips and commuting.  These go by the acronym eVTOL. Here, from a year ago, McKinsey experts predicted the market to be tens of billions of dollars by 2030; they also describe how someone will use such air taxis and how inclusive it will be.

If there is going to be $1.5B in revenue in 2025 from such vehicles then how many flights annually does that translate to?  I don’t think many people would pay more than $100 to commute to work one way, as that adds up to $50,000 to commute both directions everyday for a work year.  [And remember that just $1,000 will get you a flight between any two points in the continental US on an ordinary plane.] So we would need 30,000 commuters paying that price for $1.5B in revenue, or 15 million individual flights of eVTOLs by 2025. And for the number by the McKinsey experts about 200 million flights per year.

Suppose they are over optimistic by a factor of 100 (a little rounding error for technology prognosticators).  That still means 150,000 flights per year, or around 450 per day, just three years from now.  How many eVTOL flights per day are there now? Even if we go really high growth and assume the market increases by a factor of four per year (which has pretty much never happened at that rate for any technology) we should be seeing 7 flights per day right now. So world wide there should be over 2,000 flights per year, with a person in the vehicle, and approximating the flight profiles needed for commercial deployment.

That’s a lot. So shouldn’t there be some videos of them somewhere?  My best estimate is that there are actually zero such flights (at the commercial deployment stage) per day at the moment.

A week ago I asked my 27,000 twitter followers to send me links to videos with the following context:

People helped me out and sent me things they had seen on YouTube.

The closest video to what I asked for has since disappeared from YouTube in the last five days.  It was from the German company Lilium which can be read about at this wikipedia page. There are no reports there of a crewed flight, and there have been failures and redesigns. You can find a ‘B-roll’ video which is downloadable as I write this on Lilium’s news page. The direct link to that video is this, which has ‘July_2022’ in the file name. There is footage in there of the test flight shown in the now disappeared YouTube video. From this B-roll it is clearly remote crewed, and flown from the ground. Although there are pictures on the web site of what seems to be a seven seat airplane the one flying has a different door structure, and seems most likely to be a one or two seater. In the disappeared video the altitude and air speed was shown and it was less than 100km/hr, and only a couple of hundred meters up. BTW, if you watch the graphics on the home page they use $200 as their example price for a flight. Lilium has been operating since 2015 and has 800 employees, but has not yet moved a person around in an eVTOL flight.

Another twitter follower sent me a link to this five foot high test hop by Joby Aviation on September 14th, 2022. It appears that the seats are empty.

I had previously seen a tethered and crewed five foot high test hop by the company Vertical Aerospace but that YouTube video has also disappeared. That test hop happened on September 26th, 2022, according to the company’s press release.

In the twelve days between those two hops it was reported that Larry Page’s 12 year old ‘flying car’ company Kitty Hawk was shutting down. A few days later the Washington Post had a story essentially saying that if Larry Page’s pockets were not deep enough to get this technology developed then whose would be? There are a handful of videos of Kitty Hawk’s Heaviside vehicle flying (here, here, and here; they seem to share some footage), and all seem to be uncrewed, though the vehicle looks crew capable. These tests seem further along than the ones referenced above. But remember, this company has announced that it is shutting down.

And what about Volocoptor and their massive series E? They had their first crewed flight back in March 2016, in a rather fragile and ungainly vehicle. Not exactly ready for prime time commuters, but that is OK — all technology takes many steps to get to deployment. In June of this year there was a report that Volocoptor had completed three test flights of a much more practical craft ranging over 60 miles and 155mph. That is much closer to commercial viability, but even Volocoptor is saying that its first commercial flights will not be until 2026. AND, I cannot tell whether these test flights were crewed or not.

What is my conclusion?

All radically new products, and air taxi eVTOLs certainly count as radically new, require the convergence of many different capabilities and technologies. Often one or more technology will be lagging and it may take decades for them all to come together in the right place at the right time.

It took about 125 years for steam engines to get from running water pumps to revolutionizing transportation world wide with railways and a further fifty years to change sea travel. We have only had electric quadcoptors for about 20 years, starting at a small scale. It may take a few more decades for them to hit it big for human transportation. Not for lack of dedicated people trying to make it happen, but rather because it just may be that there are still quite a few components and improvements that yet need to be invented.

 

My IEEE Spectrum Columns and Articles

rodneybrooks.com/my-ieee-spectrum-columns-and-articles/

I’ve pulled together the links for the online versions of everything that I have written for IEEE Spectrum since 2008.

My favorite is a sonnet, very much based on Shakespeare’s most famous one, but mine starts out Shall I compare thee to creatures of God? And I have to love that picture of me the editors dug up from a photo shoot in my lab in 1995.

From the middle of 2021 to the middle of 2022 I wrote 12 monthly columns and here they all are, with my quick summaries for each of them.

  1. When humans terraform planets they will look like climate change Earth.
  2. There is a lot more battery tech development coming.
  3. How should we measure electric vehicle efficiency.
  4. Everyone is misjudging how far along AI is.
  5. Is computation the right way to think about cognition?
  6. We completely underestimate how much more computation we have than in the 1960s.
  7. Claude Shannon was a revolutionary innovator.
  8. Digital changes the shape of built architecture.
  9. The decades long developments that make reusable rockets practical.
  10. How the human world can change quickly (i.e., within a single lifetime) through autocatalysis.
  11. Will vast human crowds come together to cheer on the next Moon missions? (I really don’t like the headline chosen by the head line writer for the article.)
  12. How a small band of true believers labored in obscurity to make deep learning work.

And here are some longer pieces with the years they were published. These are behind a paywall.

2008 We are all machines, and in particular we are all robots.

2017 Self driving cars will need to interact with people, and people may not be cooperative.

2018 Building electric cars and reusable rockets is fairly easy. Building a nuclear fusion reactor, flying cars, self-driving cars, or a Hyperloop system is very hard. What makes the difference?

 

No front seat occupants; adventures in autonomous ride services

rodneybrooks.com/no-front-seat-occupants-adventures-in-autonomous-ride-services/

Just over ten years ago, on April 5th, 2012 to be precise, I took my first ride in a self-driving car, from Google X, in a research unit that has now become Waymo. There was someone sitting in the driver seat, but he had his feet and hands off the controls. We left Google and went for a drive on a couple of freeways around Mountain View in California. It was daytime, sunny, and easy.

The general zeitgeist at the time was that self driving cars, with no steering wheels installed and with no human safety driver, were just around the corner. [BTW, that was 25 years after the first autonomous vehicles drove on a public freeway, in traffic with ordinary cars with unsuspecting human drivers, just outside Munich.]

Back then, and for the next few years, most car companies were saying that they would have fully driverless software on regular consumer cars well before 2020. You may have noticed that that did not happen.

Along with that optimism was the expectation that by around now ride services companies (e.g., Uber, Lyft, and new entrants) would no longer have to rely on human drivers, but instead the ride share vehicles would drive themselves. See this report from 2019: “Uber thought it would have 75,000 autonomous vehicles on the roads this year and be operating driverless taxi services in 13 cities by 2022.”. You may have noticed that that did not happen either, not for Uber, or any other company.

The CEO of Tesla predicted in 2019 that there would be a million self-driving Tesla taxis on the road by the end of 2020. You may have noticed that the actual number was zero. And still is.

I was always skeptical about the timelines and wrote about that skepticism multiple times on this blog, here, here, and here.  I was less skeptical about driverless cars eventually arriving, though I have always been of the opinion that changing our infrastructure just a little would make the transition happen much faster.

I’ve lived in San Francisco for the last four years, and the most notable thing I can say about that experience compared to any other city which I have ever visited, apart from the hills and the magnificent vistas, is that the roads, everywhere I go, but especially at night, have been thick with sensor rich vehicles from Cruise, Waymo, and Zoox. Thick. Overwhelmingly thick.  Always with someone in the driver seat, and often someone additional in  the front passenger seat. Here is a picture I tweeted out on Friday September 27th, 2019, of Cruise vehicles that I came across at my local Safeway taking a break.  The drivers of the driverless cars were having a meetup.

Now finally there is some visible movement towards the driverless goals, in a densely packed city, for the first time.

Recently, just after 10:30pm on Thursday May 26th, I took my first rides, three of them over a 90 minute period, in Cruise vehicles, with no front seat occupants, summoning the cars on an Uber-like app, and going over a fairly wide ranging area of San Francisco.  The vehicles that I rode in looked just like those above.

My conclusion from the experience is that Cruise has put together an MVP, a “Minimal Viable Product”, the lynchpin of successful tech, and is letting the public (me!) sign up to test the product for market fit. We would expect nothing different for any new product category.

BUT, please don’t make the mistake of thinking that an MVP means that mass adoption is just around the corner.  We have a ways to go yet, and mass adoption might not be in the form of one-for-one replacement of human driving that has driven this dream for the last decade or more.

A Note About disclosures in This Post

I signed up to be allowed to download a phone app where I can call a Cruise vehicle and go to a destination that I choose, just as one does with Uber or Lyft.

I signed some waivers for Cruise as I most likely did when I signed up for Uber and Lyft (I frankly don’t remember, but since every single app for anything makes you sign a waiver I’m pretty sure that I must have done so for Uber and Lyft).

I agreed not to reveal any of their technology or trade secrets.  I don’t believe I know any of their technology or trade secrets.  All I know is the experience that I had and I will describe that.  I did not take any photos or make any recordings during the rides. The photos I show below were taken before getting into the car; they are photos that anyone on the street could have taken.

I will state correlations that I noticed.  Remember, correlations are not the same as causations.

But First, My Conclusions

I do not know the economics of Cruise being ably to profitably deliver a driverless ride hailing service with their current implementation. But from a customer experience viewpoint it seems to me that their current MVP is good enough to be deployed in certain circumstances. And certainly if existing nationwide voice communication systems like OnStar (see below) were used for any anomaly handling that would make the economics more attractive for localized deployments.

It seems to me that in a fairly big circumscribed community (I recall where my in-laws lived in a membership retirement community that spanned two golf courses with other recreational facilities and members only restaurants, in southern Florida as being such a place; there were no gates or entrance controls but it was clear where the boundaries were) the current MVP could provide more than adequate driverless transportation services within the community.  It would be able to operate 24 hours, and would not have to restrict itself to low traffic areas, as all areas would be low traffic. The mechanics of pick up and drop off are quite smooth for anyone who has used a ride hailing app before. If temporary changes due to construction or maintenance activities (see ride 3 below) could be added to the maps that the system used that would increase reliability. I think the current system would be safe and easy to use.

From my experience in one of my three rides, Cruise driving is not always as smooth as one would expect from a human driver in heavier traffic situations.

I am positive about my experience, but I do not expect to use it in lieu of Uber, here in San Francisco, despite the zero cost. The locations that the service works in avoid many of the restaurant, museum, and hotel areas of the city. Most of my trips (and I take Uber a few times per week) either start or end at one of these sorts of locations.  On top of that the hours of operation are very sparse. But mostly, the zero cost is not worth a trip potentially taking twice as long with this service. As I write this at 6pm on a Sunday, a fairly peak usage time, Uber promises a travel time of half what the first ride took on Thursday evening at 10:30pm.

THE DETAILS; A Long read

Here I go through the details of my experience which lead to the conclusions above.  This is excruciatingly detailed, and I don’t expect many people will read it all.

The setup

The service operates from 10:30pm to 5:00am according to the app. That correlates with less busy roads in San Francisco in my experience. And some nights the service is not available due to weather. I have noticed that when I can hear San Francisco’s fog horns sounding multiple times per minute the service is not available.

The rides on my app must originate and terminate in San Francisco. San Francisco covers roughly 49 square miles. Some areas of the map are greyed out in the app and one can neither pickup or drop off in those areas.  But it looks like well over half of the city is served, a few tens of square miles. The correlation that I notice is that the busy areas of the city with lots of pedestrian traffic are not covered, e.g., China Town, The Tenderloin, anywhere south of Market, and some other patches.

The rides are free.

The experience

On Thursday night, May 26th, my friend and Robust.AI co-founder Mohamed Amer and I took three rides, one after the other. We started at my house in west Cow Hollow and went to the corner of Castro Street and Market. Then from there we took a second ride to the edge of Russian Hill at Union and Polk.  We walked south on Polk to between Broadway and Pacific, and from there went back to my house. I hailed the first and third rides, and Mohamed hailed the second.

Ride 1: 23.5 minutes, 4.80 miles. Chorizo.

My house is on a block with a 23% grade, the street is not particularly wide, and one side has 90 degree parking and the other side has parallel parking.  Very rarely are cars going opposite directions able to pass each other, and one has to pull over into a gap in parking to let the other direction pass.  Worse, because of the steepness, turning onto or off of my street at the intersection above my house puts the driver in a situation where they are blind, unable to see either the road they are going onto or the intersection they are entering. There are a bunch of these “scary” intersections, with a steep road and 90 degree parking in my neighborhood. Despite seeing Waymo and Cruise vehicles all the time, I have never seen one drive on one of these “scary” blocks.

Uber picks me up right outside my house but I did not expect a driverless car to be able to do so, and indeed the app told me to meet my car around the corner on another street.

The Cruise vehicle named Chorizo showed up and this is me about to get into it.

There was no human in the car. I used the app to unlock the car and got in, followed my Mohamed. The car talked to us, saying exactly the same things on all three rides; telling us to buckle up, keep our arms inside whenever someone cracked a window open, and to gather up our things and get out on the passenger side at the end.

From the pickup it was two blocks to Divisadero which goes due south and turns into Castro; our destination was Castro and Market, 2.5 miles away by the most direct route.   The routing that Chorizo followed was 4.8 miles and took us a block past Divisadero to Scott, a much less busy street, then down to Pine, west to Masonic, and south on that. That was the only part of the route where we encountered any traffic, and there were two lanes of traffic until we crossed the panhandle. Soon after that we went further west on Waller, then south on Cole through Cole Valley, eventually getting onto Roosevelt Way going back east, crossing the end of Masonic (which we had gotten off of to go a half dozen blocks further west), then between Buena Vista Park and Corona Heights Park, on Roosevelt Way to Castro Street.  We had gone an extra block east, then 11 blocks west and come 10 blocks back east, some of it on winding roads.

I noticed that there was almost no traffic anywhere on our very elongated route except a little on Pine and more on Masonic.  If you wanted to avoid traffic at the expense of a much longer ride the route was a good choice.

On Masonic when there were two lanes of traffic I was reminded of my experiences of two decades ago when I taught my four kids to drive. There was quite a bit of heavy braking, and at one point a car behind us also had to brake heavily to avoid hitting us. We were in the left lane of two lanes of traffic. The correlation I noticed was that when a car or bus passed us on the right the brakes came on strongly, even though there was plenty of space ahead of us in our lane.

Chorizo pulled over to the curb with cars parked a little in front and a little behind to let us out.

While we were riding we tried to see if there was any person listening to what we said in the car, but there was no response. Mohamed had taken a ride by himself three days before, and the car had gotten stuck at an intersection, and people had to come to rescue it. At that time Mohamed was contacted by voice within the car, but it was over the regular GM OnStar (recently sunset) system. That approach seems like a good step to me, integrating with existing voice communication systems that many cars already have.

Ride 2; 26.7 minutes, 4.57 miles. Chorizo.

We stood around for a bit then called another car.  There only seemed to be a handful of cars available, but it was Chorizo who came back for us, on a route that involved a lot of right turns to get to us.

Our destination was firmly north and east, but Chorizo took as west again, and then we headed north on Roosevelt Way over quite a length of it that we had traversed in the same direction as we had travelled on it to get to where we were now coming from. When we got to Masonic we headed north, then east on Hayes, north on Steiner, and east on Vallejo to Polk. There was almost no traffic at all on any of these streets. If the second northbound segment had been two blocks to the west the car would have encountered a “scary” block on Vallejo, between Scott and Pierce.

At our destination the street was lined with cars and Chorizo let us out next to parked cars, just as a taxi or Uber would.

I noticed that when we stopped at a traffic light at a four way crossing the car was quite fast off the mark when the light turned green, except when there were pedestrians around.  Then it would wait a bit before taking off.  Note, all I have here is correlation. I do not know if there was any causation.  But if I were to design the behaviors (and I have been designing behaviors of mobile robots for forty years now) then most likely my system would evidence a similar correlation.

Ride 3; 9.2 minutes, 1.65 miles. Brownie.

Our last ride was short, and mostly on Vallejo until Chorizo headed north right when it reached the “scary” block at Pierce.

However the pick up showed up a real deficit, with some safety implications, compared to what one would experience with an Uber or Lyft.

When I called for a car, and this time we got Brownie, we were on the side walk on the west side of Polk. Brownie headed south on Polk towards us but turned left (we could see it) at Broadway then three rights at Larkin, Pacific, and Polk, and so was on the east side of the street.

This was consistent with where the app told us it was going to pick us up. The problem was that there was construction happening right there. Instead of a side walk there was a temporary pedestrian walkway constructed from scaffolding and wooden structures in the parking lane, and no way to get out to the road from there. Brownie stopped in the traffic lane waiting for us. We had been in an empty parking spot about 30 feet north of there, and waved to Brownie as one would with an Uber to get it to come pick you up. Brownie did not respond, so we had to walk along the active traffic lane, admittedly with Brownie blocking it, to get to Brownie. This was a mistake that even the most taciturn taxi driver would not make.

The drop off was uneventful, again not at my house, but around a corner in a different location than the original pick up.

On the first two rides we had noticed as we walked away from the car that it said something inside, before pulling away.  Strange, as no one was there inside the car to listen to it. This time we were on a much quieter street and we tarried next to the car to listen.  “Data uploaded.”