Blog

Predictions Scorecard, 2025 January 01

rodneybrooks.com/predictions-scorecard-2025-january-01/

[You can follow me on social media: @rodneyabrooks.bsky.social and see my publications etc., at https://people.csail.mit.edu/brooks]

This is my seventh annual update on how my dated predictions from January 1st, 2018 concerning (1) self driving cars, (2) robotics, AI , and machine learning, and (3) human space travel, have held up. I promised then to review them at the start of the year every year until 2050 (right after my 95th birthday), thirty two years in total. The idea is to hold myself accountable for those predictions. How right or wrong was I?

I have decided to change my rules for myself a little bit after this year, in response to the many many people who have said how much they enjoy seeing my updates.

My predictions were mostly for the first few years, and by next year the density of due dates will be very low.  So, on the eight anniversary of my first set of predictions, i.e., a year from today, I will be making a new set of predictions centered on the period January 1st 2026 to January 1st 2036, and that will give a new density of predictions where there will be real meat to see how accurately they turned out.

What I Want to Achieve and a Changing Hype-driven Landscape

The level of hype about AI, Machine Learning and Robotics completely distorts people’s understanding of reality. It distorts where VC money goes, always to something that promises impossibly large payoffs–it seems it is better to have an untested idea that would have an enormous payoff than a tested idea which can get to a sustainable business, but does not change the world for ever. It distorts what young researchers work on as they do not want to be seen as old fashioned even when the current hyped topic is sort of dumb–soon the dumbness is forgotten and the heat of the chase becomes all. It distorts what people think they need to get a degree in at college in order to have good career prospects.

I want people to use rational thought processes when they hear about hyped ideas and be able to assess what is really going on, and what is just plain (to use the technical term) bullshit.

My Color Scheme and Past Analysis

The acronyms I used for predictions in my original post were as follows.

NET year means it will not happen before that year (No Earlier Than)
BY year means I predict that it will happen by that year.
NIML, Not In My Lifetime, i.e., not before 2050.

As time passes mentioned years I color then as accurate, too pessimistic, or too optimistic.

This year I have added hemming and hawing. This is for when something looks just like what I said would take a lot longer has happened, but the underlying achievement is not what everyone expected, and is not what was delivered. This is mostly for things that were talked about as being likely to happen with no human intervention and it now appears to happen that way, but in reality there are humans in the loop that the companies never disclose. So the technology that was promised to be delivered hasn’t actually been delivered but everyone thinks it has been.

I have not changed any of the text of the first three columns of the prediction tables since their publication on the first day of 2018. I only change the text in the fourth column to say what actually happened.  This meant that by two years ago that fourth column was getting very long and skinny, so I removed them and started with fresh comments last year. I have kept last year’s comments and added new ones, with yellow backgrounds, for this year. If you want to see the previous five years of comments you can go back to  the 2023 scorecard.

Overview of changes this year

There has been a lot of activity in both self driving cars (the demise of Cruise a big push by Waymo to scale human assisted deployments, and lots of smoke and mirrors from an electric car company) and in AI, where robotics has been pulled into the ultra hyposphere while in generative AI the end of scaling and the introduction of inference mechanisms (!!) have been hotly announced and disputed.  The human spaceflight endeavor, as it did last year, has crawled along and again has stretched out dates that were probably too optimistic in the first place.

But First.

<rant>

We all know about FOMO, Fear Of Missing Out. In late 2023, for a talk on generative AI that I gave at MIT, I coined another acronym,  FOBAWTPALSL, Fear Of Being A Wimpy Techno-Pessimist And Looking Stupid Later. Perhaps that one is a little bit too much of a mouthful to catch on. These two human insecurities lead people to herd-like behavior in establishing and propagating the zeitgeist on almost any topic.

They lead to people piling on the hype fiestas, rushing to invest (money, effort, or hope) in marginal ideas once they have become a little bit popular, or believing our airspace is being invaded by foreign drones.

“Mounting evidence, and lack thereof, suggests that perhaps the whole craze has been a sort of communal fever dream fueled by crowd mentality, confirmation bias and a general distrust in all things official.”

That quote is from the drone story linked to above, but it could well as been about the hype that we are moving towards AGI (Artificial General Intelligence).

I want to be clear, as there has been for almost seventy years now, there has been significant progress in Artificial Intelligence over the last decade. There are new tools and they are being applied widely in science and technology, and are changing the way we think about ourselves, and how to make further progress.

That being said, we are not on the verge of replacing and eliminating humans in either white collar jobs or blue collar jobs. Their tasks may shift in both styles of jobs, but the jobs are not going away. We are not on the verge of a revolution in medicine and the role of human doctors. We are not on the verge of the elimination of coding as a job. We are not on the verge of replacing humans with humanoid robots to do jobs that involve physical interactions in the world. We are not on the verge of replacing human automobile and truck drivers world wide. We are not on the verge of replacing scientists with AI programs.

Breathless predictions such as these have happened for seven decades in a row, and each time people have thought the end is in sight and that it is all over for humans, that we have figured out the secrets of intelligence and it will all just scale.  The only difference this time is that these expectations have leaked out into the world at large. I’ll analyze why this continues to happen below in the section on AI and ML.

Here is a list of some of those hype cycles that I, personally, have perceived and lived through, as taken from my presentation at MIT in late 2023 that I referenced above re FOBAWTPALSL.

Really, was there really hype about all these things?  Yes, there was, within the circles that cared. Those circles have gotten wider and wider and when reigning world chess champion Garry Kasparov was beaten by I.B.M.’s Deep Blue computer under tournament conditions in 1997 it was widely reported in the popular press, And it was declared that it was all over for humans.

Back in February 2011 a computer program named Watson played on the television game show Jeopardy against all time human champions. John Markoff, legendary technology reporter at the New York Times, wrote stories about this the day before the competition, and the day after, when Watson had indeed beaten the humans, with the same questions (fed as text to it as the same time as the humans heard the questions) all running on a cluster of machines not connected to an outside network. Here are three successive paragraphs from the second of those stories.

For I.B.M., the future will happen very quickly, company executives said. On Thursday it plans to announce that it will collaborate with Columbia University and the University of Maryland to create a physician’s assistant service that will allow doctors to query a cybernetic assistant. The company also plans to work with Nuance Communications Inc. to add voice recognition to the physician’s assistant, possibly making the service available in as little as 18 months.

“I have been in medical education for 40 years and we’re still a very memory-based curriculum,” said Dr. Herbert Chase, a professor of clinical medicine at Columbia University who is working with I.B.M. on the physician’s assistant. “The power of Watson- like tools will cause us to reconsider what it is we want students to do.”

I.B.M. executives also said they are in discussions with a major consumer electronics retailer to develop a version of Watson, named after I.B.M.’s founder, Thomas J. Watson, that would be able to interact with consumers on a variety of subjects like buying decisions and technical support.

My personal experience at that time was people I did not know, but who had heard about my role at MIT (as director of the MIT AI Lab, and then founding director of MIT CSAIL, the Computer Science and Artificial Intelligence Lab) would come up to me and ask about the future of medicine. The people were variously doctors or health industry executives. I reassured them that medicine as we knew it then would stay much the same and was not about to be rendered obsolete.

And then in 2016 Geoff Hinton, one of the key architects of Deep Learning (which has had undeniable impact on the world) said:

“People should stop training radiologists now. It is just completely obvious that within five years deep learning is going to be better than radiologists.”

More people asking me whether this was true. It wasn’t in five years and it isn’t now. We need more radiologists than ever. And yes they do use deep learning tools to help them see some things they wouldn’t otherwise see. But they also understand anomalies using causal reasoning and we would be in a sorry state if all radiology was done by programs today.

Now look at those plum colored paragraphs above again as you take yourself way back in time to a year or so ago when ChatGPT was just a baby AGI, You can find stories just like this one if you substitute “ChatGPT” for “Watson” and “Microsoft” for “I.B.M.”

The things confidently predicted in 2011 (and in 1979, and in 2016) about the end of doctors didn’t happen then and it is not happening now. Nor are all the other jobs ending.

Today I get asked about humanoid robots taking away people’s jobs. In March 2023 I was at a cocktail party and there was a humanoid robot behind the bar making jokes with people and shakily (in a bad way) mixing drinks. A waiter was standing about 20 feet away silently staring at the robot with mouth hanging open. I went over and told her it was tele-operated. “Thank God” she said. (And I didn’t need to explain what “tele-operated” meant). Humanoids are not going to be taking away jobs anytime soon (and by that I mean not for decades).

You, you people!, are all making fundamental errors in understanding the technologies and where their boundaries lie. Many of them will be useful technologies but their imagined capabilities are just not going to come about in the time frames the majority of the technology and prognosticator class, deeply driven by FOBAWTPALSL, think.

But this time it is different you say. This time it is really going to happen. You just don’t understand how powerful AI is now, you say. All the early predictions were clearly wrong and premature as the AI programs were clearly not as good as now and we had much less computation back then. This time it is all different and it is for sure now.

Yeah, well, I’ve got a Second Coming to sell you…

</rant>

Self Driving Cars

As with flying cars the definition, or common understanding, of what self driving cars really means has changed since my post on predictions seven years ago.  At that time self driving cars meant that the cars would drive themselves to wherever they were told to go with no further human control inputs.

Now self driving cars means that there is no one in the drivers seat, but there may well be, and in all cases so far deployed, humans monitoring those cars from a remote location, and occasionally sending control inputs to the cars. The companies do not advertise this feature out loud too much, but they do acknowledge it, and the reports are that it happens somewhere between every one to two miles traveled. These inputs are not direct control of the normal human mechanism of control the steering wheel, the brakes, and the accelerator.  Rather they are advice that overrides some of the algorithms.  For instance, “steer out into the next lane and go around this truck” as the human realizes that the truck is just not going to move (see an anecdote below on the first night I took the new Waymo taxis in San Francisco (I had previously last ridden a Waymo in 2012 in Mountain View)).

Why is this difference important?  One of the motivations for self driving cars was that the economics of taxis, cars that people hire at any time for a short ride of a few miles from where they are to somewhere else of their choosing, would be radically different as there would be no driver. Systems which do require remote operations assistance to get full reliability cut into that economic advantage and have a higher burden on their ROI calculations to make a business case for their adoption and therefore their time horizon to scaling across geographies.

But wait, you might say, isn’t that electric car company that used to be based in California and is now based in Texas going to roll this out imminently and have a fully digital taxi service. They demoed it on a Hollywood movie studio lot just this year, and the cars were painted gold. Hmm. The location of the demo and the fact that the cars, even down to the tires, were painted gold tells you everything you need to know. Both the cars and the humanoid robots at that event were presented as autonomous but in reality they were all tele-operated directly by people (see below in the humanoid section for more details). And that same electric car company is actively hiring people into paying jobs as remote operators.

There was a reasonably balanced appraisal from Reuters just after the event, though it does not go into details of the demos. Here is a direct quote from the story:

“We do expect to start fully autonomous unsupervised FSD in Texas and California next year.” Musk said.

The astute reader will note that this is the 11th year in a row that the CEO of Tesla has made this prediction of the same milestone happening the next year. We can admire the consistency.

Actual self-driving is now generally accepted to be much harder than every one believed.

The reason that this bait and switch is important to understand is that the promise of inevitable fully self driving technology upended a historical way that new transportation systems have been adopted.

In the past whenever we have introduced new transportation mechanisms there have been large investments in infrastructure and that infrastructure is shared and used by everyone. The Romans built roads so soldiers and traded goods could travel long distances–in Europe those road networks are still the basis of today’s road networks. When steam engine driven trains were the new transportation technology vast networks of rails were built allowing goods to move long distances in mere hours or days. When Ford started mass production of automobiles he built roads and the local governments followed and the the Federal government followed, and those roads are what we use today.

Actual fully self driving cars promised that no infrastructure changes would be needed to revolutionize how vehicles would be controlled. Each individual vehicle would do what was needed all by itself. As sensors and networks got better there was no need for expensive new infrastructure because of this promise.

The promise was false. If government and private partnerships in building smart roads, which was a hot topic in the 1990s. had continued, every one of us would now have smarter safer cars, but still with onboard human drivers taking over in many situations. But we would have had smart freeways where once you were on it your car would be self driving. The road would have had lots of sensors effectively shared across all cars, as that data would have been transmitted to all passing cars. It would have been a fraction of the cost per car compared to the sensing on today’s almost but not really self driving cars like those of Waymo. And we would have had much more accurate congestion data where the root causes of local congestion would have been sensed with semantic understanding rather than just inferring it from the aggregate collection of location data from phones, individual cars, and historical data from roadside sensors.

Instead we now have individual corporate actors using a mixture of partial self driving and remote human supervision. The big question is whether the economics of this works at scale, and whether the fake promises will drive out the human drivers in cheaper services and we’ll all end up paying more. Will the level of hype we saw push our decentralized transportation system into the hands of a few wealthy companies, and in effect make it a centralized system where everybody has to pay private companies to be part of it?

As a reminder of how strong the hype was and the certainty of promises that it was just around the corner here is a snapshot of a whole bunch of predictions by major executives from 2017.

I have shown this many times before but there is one new annotation here for 2024. The years in parentheses are when the predictions were made. The years in blue are the years are the predicted years of achievement. When a blue year is shaded pink it means that it did not come to pass by then. The predictions with orange arrows are those that I had noticed had later been retracted.

The prediction that Jaguar and Land-Rover made that they would have fully autonomous cars by 2024 did not come to pass, so I have shaded it pink,

Note that every single blue year up until now is shaded pink, and that every one that is shaded pink has still not come to pass. None of the predictions that were out there in 2017 for the next few years have happened.  None. There are three more for 2025, and I am sure that a year from now they will all be shaded pink also.

One of the big selling points of self driving cars was that they would be safer than cars driven by humans. So far that is not holding up with real data. One electric car maker with self driving software had it disengage when it sensed there would be an accident, supposedly so that the human could take over in a split second. And then the company did not report the incident as the fault of the software as it was no longer controlling the car when the impact occurred. It was reported, and I had this experience myself in my last ride in a Cruise in 2023, that Cruise vehicles would freeze when an accident looked likely, and then not report it as their software’s fault as the car was stationary and was hit by another car. In many reported cases, and in my case, simply continuing to move forward would avert any likely accident (fortunately for me the human driver of the other car slammed on the brakes and did not hit my robot vehicle).

In this story from the Washington Post about Federal investigations into the safety incidents with self driving cars, they report that the companies involved claim they have vast amounts of driving on our roads under their belt. Not so.

An industry association says autonomous vehicles have logged a total of 70 million miles, a figure that it compares to 293 trips to the moon and back. But it’s a tiny fraction of the almost 9 billion miles that Americans drive every day. The relatively small number of miles the vehicles have driven makes it difficult to draw broad conclusions about their safety.

To put that into perspective, the total number of miles driven by all autonomous (sort of) vehicles over the last decade is less than 1% of the miles driven by humans every day in the United States. It is a tiny, tiny portion.

Take a look at this embedded video from the Wall Street Journal about investigations of crashes (many of which have been fatal) involving autonomous driving systems.

From the audio: “The kinds of things that tend to go wrong with these systems are things like it was not trained on, pictures of an overturned double trailer. It just didn’t know what it was. There were some lights there, but the lights were in unusual positions. A person would have clearly said something big is in the middle of the road. But the way machine learning works is it trains it on a bunch of examples and if it encounters something it doesn’t have a bunch of examples for it may have no idea what’s going on.”

[[My own take is that the fetish of end to end learning leads people to leave out well known algorithms that might solve many of  these problems (e.g,, the incredibly simple time to collision algorithms based on looming). Yes, end to end learning made speech understanding systems better, but that does not mean it is the appropriate fetish to apply everywhere.]]

Pro tip: Think about this history of industry prognostications about fully autonomous driving being just around the corner when you read today’s prognostications about LLMs taking jobs, en masse, in the next couple of years, or humanoid robots being dirt cheap and being able to learn how to do any human manual task real real soon now. You know you have seen this movie before…

My own experiences with Waymo in 2024

I have two sorts of experiences with Waymo vehicles. First, as a driver of my own vehicle and sharing road space with them every single time that I drive. And second, as a user of their ride service.

The streets of San Francisco had been thick with Waymo vehicles with no driver in them especially in the second half of 2024. As I drive across the city every morning to head down to my robotics/AI startup half way down the peninsula I see them everywhere until I get on to 101.  I see them in front of me and behind me and in adjacent lanes as I drive on multilane one way streets. Sometimes I see four of them in a single block. Twice I’ve seen four of them in a line, in my block and could see four of them in a line in the block ahead of me.  When I am at four way intersections with no traffic lights I see them participating in the social ritual of taking your turn to drive through the intersection in the order you stopped, except when a pedestrian is crossing in front of you. They do that pretty well. They do less well when they accidentally get into a line of parents’ cars snaking around a corner for school drop off or pickup.

Over the last few months I have noticed that in general they are getting more aggressive about stretching the rules, just like people do. Otherwise human drivers (including me) take advantage of their politeness. That aggression is not always welcomed. One morning I saw a workman with a group doing some digging on a road, and holding a sign with SLOW on one side and STOP on the other side have to jump in front of a Waymo to get it to do what he was trying to tell it to do with the sign. STOP. It wasn’t stopping for no stinking sign!

The only time I have seen a Waymo go into reverse, ever, was when I was illegally driving the wrong way down a single lane street and we were heading straight at each other.

As a rider I feel they are not quite aggressive enough with human drivers some time, so a ride in a Waymo takes longer than with an Uber or Lyft.

It is hit and miss where they drop me off. Sometimes they take a place to pull over half a block from my house, even when it is raining. There is no way to adjust what they happen to decide that day, even though I know that they will always be able to pull in right in front of my house.

The first time I took a Waymo this year, on the way home it picked me up at a restaurant and then was about to make a right turn. But at that corner there was an 18 wheeler with its lights flashing and surrounded by green cones. It pulled right in behind that truck and waited a long time before it drove forward. I am guessing a remote operator intervened told it to go around because eventually it pulled around it in the lane just to the left. Based on seeing Waymos interact with orange cones I suspect it would have done better if the cones had been orange rather than green.  This easily illustrates that the learning that this robot does, and indeed any robot does, is nothing like the learning that people do (see my rant about the seven deadly sins and mistaking performance for competence in the section below on advances in AI and ML).

I mostly feel safe when I am a passenger in a Waymo.  Sometimes I don’t feel that my driver of an Uber that I am taking rides with Uber that are not as safe as I would prefer.

Self Driving Taxi Services

There have been three self driving taxi services in the US in various stages of play over the last handful of years, though it turns out, as pointed out above that all of them have remote operators. They are Waymo, Cruise, and Zoox.

Waymo and Cruise are similar in that they use conventional cars adorned with lots of sensors. Zoox has purpose built vehicles that have no steering wheel or pedals for brake or accelerator.

Waymo and Cruise went for deployments in large parts of two or more cities and have had ride services callable by apps, just as one can do with Uber or Lyft. Zoox is smaller scale, much more restricted in geography, and really not comparable.

At this time last year Cruise was in trouble has it had suspended all of its San Francisco operations under pressure from regulators after some bad accidents that happened in a way that never would happen for human driven cars.  Briefly, their cars were getting hit at night by emergency vehicles with lights flashing as the Cruise cars crossed intersections. Human drivers see the reflections of lights from such vehicles flashing even if they don’t see the vehicles themselves. The Cruise vehicles were only reacting to flashing lights that they could perceive directly. But the accident that tipped the scales was when a pedestrian crossing in front of a human driven vehicle was hit and went flying in the air landing right in front of a Cruise. The Cruise hit the person (who now disappeared from sight) as a human driver would most likely have done. But then it proceeded to drive 20 feet with the human underneath the vehicle being dragged along as it went into a mode where it was supposed to get off the road. A human driver would not have reacted that way to having been in a collision, even if it was not their fault.

The hammer finally fell in December of 2024. General Motors shut down Cruise. The leading paragraphs from this linked story from the Wall Street Journal are:

General Motors has scrapped its Cruise robotaxi program after nearly a decade and $10 billion in development, citing the time and costs needed to scale the business and rising competition.

GM on Tuesday said it plans to realign its autonomous driving strategy and give priority to development of advanced driver assistance systems, which take over steering and other functions in certain situations and are common on new vehicles today.

The automaker said it would continue to develop fully autonomous technology for personal vehicles, and build on the progress of its Super Cruise system, a hands-off, eyes-on driving feature that the company introduced several years ago.

GM said it owns about 90% of Cruise and intends to buy out the remaining investors. It plans to combine the technical teams from Cruise and GM into a single effort to advance autonomous and assisted driving.

“We want to leverage what already has been done as we go forward in this,” Chief Executive Mary Barra told analysts on a call Tuesday.

The Detroit automaker said it expects the restructuring to reduce spending by more than $1 billion annually after the proposed plan is completed, which is expected in the first half of next year.

While there are 40 companies that have permits to test autonomous driving in California, alone, the demise of Cruise leaves just one company, Waymo, trying to make an actual go of a digital taxi service in the United States. They have an enormous significant lead over anyone else who wants get into this business and have spent billions of dollars (probably very much north of $10 billion) on this endeavor over the last 15 years. In an email they sent me a couple of weeks ago as a user of their services they reported that they provided 4 million customer rides in 2024. That is approximately 4 million more than any other company in the United States.

Waymo

Despite being so far out in front it has not been all smooth sailing for Waymo.

Early in the year the operations center for Waymo somehow neglected to realize it was Chinese New Year in Chinatown in San Francisco. So Waymo vehicles were routed through that area on the biggest night of celebration. Any human driver would have realized that the streets, i.e., the street surfaces where cars usually drive, were completely packed with humans, no doubt some of whom were intoxicated as well as just being out having a good time. Not so the Waymo vehicles. They tried pushing through the very very dense crowds, no doubt annoying many people. And what do people have at Chinese New Year?  Fireworks. So some revelers decided to push back on this robot car invading their space. Here are a couple of pictures of the results.

Not pretty.  And an example of how taking away people’s agency is never a good idea for robots (see my second law of robotics).

Throughout 2024 Waymo has been investigates for various accidents such as those described in this Wall Street Journal article. “Reports included collisions with stationary or semistationary objects, such as gates, chains or parked vehicles, according to the regulator.”

In the middle of the summer Waymo added a feature where they would honk their horns at cars in their way. But this backfired when hundreds of Waymos were coming back to their parking lot in the very early hours of the morning, and they started honking at each other and waking up human neighbors. Eventually that got fixed.

In late September a motorcade for Kamala Harris in San Francisco was brought to a halt by a Waymo that stopped in the middle of California Street doing a U-turn in front of it. I’m sure this incident was of great concern to the Secret Service. Eventually a San Francisco police officer got into the car and drove it out of the way–this is shown in a video included with the story above. I do not know how the officer got access to the vehicle and whether Waymo remote operations were cooperating.

More disturbingly humans outside the Waymos started harrassing humans inside them. The most concerning cases come from the realization that if a woman is in a Waymo at night she will be dropped off, outside, on a public road at the end of her journey with no option but to get out of the car where it has stopped. So groups of men have followed Waymos with women in them and then harassing the woman when she gets out. If she was driving her own car she might be heading to an off road parking space or she might choose not to stop if she knows she is being followed. There are no such options in a Waymo so taking a Waymo at night is less safe than other means of transportation–just follow it and eventually the preyed upon woman will have to get out. Here is a very recent disturbing story about this practice.

Meanwhile Waymo managed to raise $5.6B to expand to new cities in 2025. It already operates in parts of San Francisco, Los Angeles, and Phoenix. The new money will let it expand to Austin and Atlanta in the United States and to start operating in parts of Tokyo in Japan. That is expensive expansion.

Here is the question for the future of watered down remote monitored “autonomous” driving systems (let’s call it “watered down autonomy”), and it is up to Waymo now. Can Waymo expand fast enough in these new markets in 2025 and take enough business from what is left of traditional taxi operators, along with those operating under the Uber and Lyft models, and do it in a way which is in sight of profitability, so that it has a case to raise the stupendous amounts of money needed to operate in all large cities in the US in the next 10 t0 20 years?

If Waymo can not succeed at this in the next two years I think the idea of large scale use of watered down autonomy will be dead for at least a decade or two. Right now full autonomy everywhere is already dead.

Prediction
[Self Driving Cars]
Date2018 CommentsUpdates
A flying car can be purchased by any US resident if they have enough money.NET 2036There is a real possibility that this will not happen at all by 2050.
Flying cars reach 0.01% of US total cars.NET 2042That would be about 26,000 flying cars given today's total.
Flying cars reach 0.1% of US total cars.NIML
First dedicated lane where only cars in truly driverless mode are allowed on a public freeway.
NET 2021
This is a bit like current day HOV lanes. My bet is the left most lane on 101 between SF and Silicon Valley (currently largely the domain of speeding Teslas in any case). People will have to have their hands on the wheel until the car is in the dedicated lane.
Such a dedicated lane where the cars communicate and drive with reduced spacing at higher speed than people are allowed to drive
NET 2024
20240101
This didn't happen in 2023 so I can call it now. But there are no plans anywhere for infrastructure to communicate with cars, though some startups are finally starting to look at this idea--it was investigated and prototyped by academia 20 years ago.
First driverless "taxi" service in a major US city, with dedicated pick up and drop off points, and restrictions on weather and time of day.
NET 2021
The pick up and drop off points will not be parking spots, but like bus stops they will be marked and restricted for that purpose only.20240101
People may think this happened in San Francisco in 2023, but it didn't. Cruise has now admitted that there were humans in the loop intervening a few percent of the time. THIS IS NOT DRIVERLESS. Without a clear statement from Waymo to the contrary, one must assume the same for them. Smoke and mirrors.
Such "taxi" services where the cars are also used with drivers at other times and with extended geography, in 10 major US cities
NET 2025
A key predictor here is when the sensors get cheap enough that using the car with a driver and not using those sensors still makes economic sense.20250101
Imminent dual use of personal cars was the carrot that got lots of people to pay cash when buying a Tesla for the software subscription that would allow thei car to operate in this way. Shockingly the CEO of Tesla announced in smoke and mirrors roll out of Cyber Cab in 2024, that the service would use specially built vehicles to be produced at some indeterminate late date. I got suckered by his hype. This is unlikely to happen in the first half of this century.
Such "taxi" service as above in 50 of the 100 biggest US cities.NET 2028It will be a very slow start and roll out. The designated pick up and drop off points may be used by multiple vendors, with communication between them in order to schedule cars in and out.
20250101
Even the watered down version of this with remote operators is not gong to happen in 50 cities by 2028. Waymo has it in 3 cities and is currently planning on 2 more in the US in 2025.
Dedicated driverless package delivery vehicles in very restricted geographies of a major US city.
NET 2023
The geographies will have to be where the roads are wide enough for other drivers to get around stopped vehicles.
A (profitable) parking garage where certain brands of cars can be left and picked up at the entrance and they will go park themselves in a human free environment.
NET 2023
The economic incentive is much higher parking density, and it will require communication between the cars and the garage infrastructure.
A driverless "taxi" service in a major US city with arbitrary pick and drop off locations, even in a restricted geographical area.
NET 2032
NET 2032
This is what Uber, Lyft, and conventional taxi services can do today.20240101
Looked like it was getting close until the dirty laundry came out.
20250101
Waymo now has a service that looks and feels like this in San Francisco, 8 years earlier than I predicted. But it is not what every one was expecting. There are humans in the loop. And for those of us who use it regularly we know it is not as general case on drop off and pick up as it is with human drivers.
Driverless taxi services operating on all streets in Cambridgeport, MA, and Greenwich Village, NY. NET 2035Unless parking and human drivers are banned from those areas before then.
A major city bans parking and cars with drivers from a non-trivial portion of a city so that driverless cars have free reign in that area.NET 2027
BY 2031
This will be the starting point for a turning of the tide towards driverless cars.
The majority of US cities have the majority of their downtown under such rules.NET 2045
Electric cars hit 30% of US car sales.NET 202720240101
This one looked pessimistic last year, but now looks at risk. There was a considerable slow down in the second derivative of adoption this year in the US.
20250101
Q3 2024 had the rate 8.9% so there is no way it can reach 30% in 2027. I was way too optimistic at a time when EV enthusiasts thought I was horribly pessimistic.
Electric car sales in the US make up essentially 100% of the sales.NET 2038
Individually owned cars can go underground onto a pallet and be whisked underground to another location in a city at more than 100mph.NIMLThere might be some small demonstration projects, but they will be just that, not real, viable mass market services.
First time that a car equipped with some version of a solution for the trolley problem is involved in an accident where it is practically invoked.NIMLRecall that a variation of this was a key plot aspect in the movie "I, Robot", where a robot had rescued the Will Smith character after a car accident at the expense of letting a young girl die.

Electric Cars

Last year US manufacturers pulled back on their planned production of EVs. In data from this report we can see that sales dropped at the start of 2024 but have now picked up again.

20222022202220222023202320232023202420242024
Q1Q2Q3Q4Q1Q2Q3Q4Q1Q2Q3
5.3%5.6%6.1%6.5%7.3%7.2%7.9%8.1%7.3%8.0%8.9%

There is steady growth in sales but my prediction of 30% of US car sales being electric by 2027 now seems wildly optimistic. We need two doublings to get there in three years and the doubling rate seems more like one doubling in four to five years.

Note that some sources include hybrids and hydrogen powered cars in electric vehicles but I am using the battery electric vehicle (BEV) numbers.

To see how the trends are across brands you can see a breakout for Q2 of 2024 here.

There appear to be two main headwinds for BEV adoption. Firstly, if one doesn’t have on property residential parking it is hard work in the US to find a place to recharge, and it takes hours for the charging to finish. This will stop many city dwellers from adopting. Secondly the increased tire wear adds up to real money. The maintenance requirements for BEVs are much less than for cars with an internal combustion engine. On the other hand tires do not last as long (I have had to buy four new tires in less than two years owning my first BEV), apparently due to the increased weight of the car.

Flying Cars

Flying cars are another category where the definitions have changed. Back when I made my predictions it meant a vehicle that could both drive on roads and fly through the air.  Now it has come to mean an electric multi-rotor helicopter than can operate like a taxi between various fixed landing locations. Often touted are versions that have no human pilot. These are known as eVTOLs, for “electric vertical take off & landing”.

Large valuations have been given to start ups who make nice videos of their electric air taxis flying about. But on inspection one sees that they don’t have people in them. Often, you might notice, even those flights are completely over water rather than land. I wrote about the lack of videos of viable prototypes back in November 2022.

Nevertheless there have been wild predictions.  I ended a longer version of this component in last year’s annual review with:

Also note the size of this vehicle. There are many fossil fuel powered helicopters that are much smaller. This is not going to be a personally owned vehicle for the masses.

Don’t hold your breath. They are not here. They are not coming soon.

Nothing has changed. Billions of dollars have been spent on this fantasy of personal flying cars.  It is just that, a fantasy, largely fueled by spending by billionaires.

Robotics, AI, and Machine Learning

So what happened in Robotics, AI, and Machine Learning this year?

Many, many, many people got just a little bit over excited. That’s what happened.

There have been a lot of party tricks and it is the researchers who often play the tricks on themselves without realizing it. This is not new, none of it is new. But there are orders of magnitude more people watching it now, and more people are out to make a buck by being hypesters, promising riches to those who will invest in their irrationally overpriced companies.

How could this be?

We are seeing mass sinning, lots and lots of people committing some of the seven deadly sins of predicting the future of AI  which I wrote about back in 2017 here (or here you can see a professionally edited version of that blog post of mine).

Four of those seven sins seem most relevant to today’s hyped up atmosphere around robotics, AI, and machine learning.

   

Here now are short descriptions of these particular four sins, edited down from my earlier much more detailed descriptions. Then I will weave them together to explain how it is still pretty much business as usual, and I mean that in a good way, with steady progress on both the science and engineering of AI.

Performance versus Competence

One of the social skills that we all develop is an ability to estimate the capabilities of individual people with whom we interact. We use cues from how a person performs any particular task to estimate how well they might perform some different task. We are able to generalize from observing performance at one task to a guess at competence over a much bigger set of tasks.

These estimators that we have all inherited or learned do not generalize well to other creatures or machines. We are not good at guessing which smart things other species might be able to do, and we are not good at guessing what an AI system can do when we have seen it do a few tasks in a limited domain. We get it wrong all the time.

Indistinguishable from Magic

When people cannot explain how something works they cannot know its limits as they do not have any sort of model (nor have they seen enough examples of it before). Arthur C. Clarke said that any sufficiently advanced technology is indistinguishable from magic.

In our minds UFOs can do all sorts of amazing things as we have no way of knowing their limits–they may as well be magic, And that is what they become in speculation about them.

Isaac Newton spent half his working life on alchemy as he did not know that the nucleus of atoms were not subject to mere chemistry. He would have been just as ignorant of the limitations of an iPhone screen (different sort of apple…), despite his own ground breaking work in optics. Remember, he was a really really smart dude. But even he was not able to develop all the theories needed to understand the world around him, despite his successes with calculus and gravity and the makeup of white light. He attributed properties to chemistry that were way beyond its limits.

Exponentialism

We have just lived through sixty years of the most phenomenal growth of a technology in the history of humankind. It is the story of silicon-based computation. Everyone has some idea about Moore’s Law, at least as much to sort of know that computers get better and better on a clockwork like schedule.

This reality has trained people to think that probably a lot of other things in tech will change exponentially, especially when that thing has a strong computational component. The sin of exponentialism is to argue that some other process is going to follow a Moore’s-like law when it is unwarranted to so argue.

Moore’s law worked for so long because in the starting technology of the 1960s the currents used to represent digital information were many many orders of magnitude beyond the minimal physical limit needed to determine whether they  were present or not, and hence distinguish a 1 from a 0. Those currents could be halved many times without breaking physics limits.

Speed of Deployment

New technologies get deployed much more slowly than people imagine. Even software technologies.

The old internet protocol, IPv4, can only address two billion, or 2×109, devices, which is way less than the number of people on our planet. A new protocol, IPv6, which can address more than 3×1038 devices was meant to replace it over a two year period of dual use by about 2003. But in 2024 IPv4 was still there and carrying over half the world’s internet traffic despite its inadequacies.

Must functioning businesses that operate in the physical world are very averse to taking up new technology as it dramatically increases existential risk to their business. They must foresee immediate and incredibly high return on investment (ROI) to be tempted to move to new technologies.

Even the military is slow to adopt new technologies. The US Air Force still flies the B-52H variant of the B-52 bomber. This version was introduced in 1961, making it 63 years old. The last one was built in 1963, a mere 61 years ago. Currently these planes are expected to keep flying until at least 2040, and perhaps longer–there is talk of extending their life out to 100 years.

What does this all mean?

Right now there is incredible hype for both Large Language Models (LLMs), and all their variations, and for humanoid robots, especially humanoid robots that are going to learn how to do things.

The hype is driven by the four sins above.

LLMs

LLMs have proved amazing facile with language. They have been trained on pretty much all the text that is available on the Web and all the digitized historical books that exist. Miraculously LLMs seem to be able to infer a representation of some sort, that is somewhat independent of the particular human language that they read. So they are able to translate between human languages, and when you ask them just about anything they produce text in the language that you asked in, and that text often seems entirely reasonable and informative.

I used the word “miraculously” as we do not really understand why they are able to do what they do. We, of course, know that the architecture for them is built around noticing correlations in vast amounts of text  that connect some tens of thousands of tokens which are the components of words in each language that is digested. It is a surprise that they work as well as they they do, and produce coherent sounding language on just about any topic.

Here is the original architectural diagram from the 2017 Attention Is All You Need paper:

Each column from bottom to top is a pure feed forward network, with no search, no iteration, no conventional algorithm at all. There are inputs at the bottom and then layer upon layer of linear neurons that have numbers or weights stored in them that multiply and add their inputs and threshold that sum to provide an output. The detail in the architectural diagram is how the connections between layers are organized.

On the left is an input or question, in a linear string of words, from a user. That gets injected half way up the network on the right and remains constant while a single iteration process runs. The stack on the right outputs a word (or token) and that gets fed back to the bottom of that stack, and a new token pops out the top. All the output tokens that have so far been produced remain in the right bottom input buffer as ordered input.

What the network has been trained to do, is given the user input on the left, and what the network has output so far, choose a very likely next word, given the billions of examples it has seen in training. Some randomness is used to choose among a small number of very likely next words at each stage.

There are hundreds of billions of weights that get learned and stored in the layers of network to act as multipliers for each individual input to each layer.

So now us humans are faced with looking at this system running and our human nature just makes us commit the first two sins from above.  It is in our nature and we cannot help ourselves.

First, we see really impressive examples of responses to input questions, and if a human was giving those answers we would estimate that person to be quite clever and able to reason. Often though, because they have so many billions of examples on which they were trained LLMs are essentially looking up the question in the weights. The weight if gained from all of human knowledge that is out there on the network in language form. Invisibly the network is perhaps (but not in any intentional way) merging some similar questions, and then merging the answers which were already in the vast data that it has seen.

But us dumb humans just think the damn thing is really really smart.

Then, since we don’t have a real explanation in our heads for what it is doing we start thinking it is magic, and that there is no real limit to what it is extracting from all that data (that it used a significant portion of the energy budget for many different countries to compute) and how general its capabilities will be. It becomes magic. And then researchers try to show that it can reason, that it has inferred a spatial understanding of the world, that language can be used to do all sorts of things that Moravec’s paradox tells us it can’t. There is a lot of magical thinking that humans do about LLMs.

Of course it can diagnose diseases like a doctor talking about them. Of course it can teach a student as well as a human teacher. Of course it can program as well as a human computer programmer. It is magic after all.

But in reality the fact that it is just picking likely next words means that in fact we can’t trust its output. Some outputs are great. Some are pure confabulations (most people use the word “hallucinations” for this, but I prefer “confabulations”). And we do not know which we will get ahead of time, or more perniciously how much of each we will get, trustworthy pieces of output and confabulated pieces of output all jumbled together.

Not to worry say the proponents, More learning will fix it. Fire up a nuclear power plant (I am not making this up–the tech companies are getting more nuclear power built or activated so that their LLMs can learn what a human learns using just 20 watts powering their brain; I am not confabulating this!!), and we’ll feed it more data and it will become more trustworthy.  It is magic after all. But the magic is not going as well as the proponents imagined and promised as this Wall Street Journal story explains. Their imaginations were definitely encourage by exponentialism, but in fact all they knew was that when the went from smallish to largish networks following the architectural diagram above, the performance got much better. So the inherent reasoning was that if more made things better then more more would make things more better. Alas for them it appears that this is probably not the case. But rabid exponentialists have not yet given up. Expect a bunch of VCs to adversely affect the growth of pension funds around the world as pension funds are a prime source of capital that VCs spend.

More serious academics are working on boxing in the LLMs with more external mechanism beyond just feeding the output tokens back in as a linear string of input. Many of these mechanisms look a lot like more conventional AI mechanisms, and we will see where these additions prove to be useful, how much of the wheel will be reinvented, and how long (months?, years?, decades?) to get there.

And the answers to those last questions will tell us how much sinning has been done by companies in predicting fast deployments. Back in rant at the beginning of this post I gave the example of I.B.M. and Watson and their completely optimistic predictions of how any problems of applying Watson (which seemed extremely competent based on its performance on live TV) to the real world would be solvable. The areas that it was predicted to be applicable came from magical thinking.

Surely no one today could be as dumb as that big company was back in 2011. Surely not. No, not us smart inhabitants of 2025. Its us. We are nowhere near as dumb as them!!

Humanoid Robots

The other thing that has gotten over hyped in 2024 is humanoids robots.  The rationale for humanoid robots being a thing is a product of the four sins above and I think way less rooted in reality than the hype about LLMs. In fact I think it is pretty dumb. [[I suspect many people will reason that I cannot have a valid opinion about this precisely because I happen to have built more humanoid robots than anyone else on the planet. So read ahead with caution.]]

My first law of robotics states:

The visual appearance of a robot makes a promise about what it can do and how smart it is. It needs to deliver or slightly over deliver on that promise or it will not be accepted.

The first sentence describes, I think, what is sucking people into believing that humanoid robots have a big future. It looks like a human, so its performance will be like a human, so it will be competent like a human.  It’s the performance/competence sin without even waiting for the performance part!

The second sentence describes how the humanoid fever will break, and how the hundreds of millions of dollars put into many of these companies (billions of dollars overall) will disappear. The puppets will not perform at acceptable levels. It is easy to see this as you hear all the things investors and CEOs of humanoid robots say they will be able to do. They have hardly even got to the lab demonstration phase.  My third law of robotics is:

Technologies for robots need 10+ years of steady improvement beyond lab demos of the target tasks to mature to low cost and to have their limitations characterized well enough that they can deliver 99.9% of the time. Every 10 more years gets another 9 in reliability.

For real work, robots need to operate with four, five, or six nines. We are a long way from that. The zeitgeist is that we will simply teach the robots to do stuff and then they will be able to do it.

BUT, we do not know yet whether that is going to work. In order for it to work you have to both collect the right sort of data and then learn the right things from that data. It is not at all clear to me that we know the answers to make either of those things true. I think it will be an active place for lots of good research for many years to come.

There is an excellent survey paper of current research state of the art called Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes. Unfortunately I think the title of the paper is going to confuse many people. “Real-World Successes” to someone like me, who these days deploys robots that people pay for and that provide real ROI, sounds like it is about systems that have been deployed. But on reading the paper it turns out that they mean that it is learning and demonstrations done in a lab setting on physical hardware rather than just in simulations and simulators.  And, to me the lab demonstrations are shakier (literally) than I imagined in my third law above.

I think we are a long way off from being able to for-real deploy humanoid robots which have even minimal performance to be useable and even further off from ones that have enough ROI for people want to use them for anything beyond marketing the forward thinking outlook of the buyer.

Despite this, many people have predicted that the cost of humanoid robots will drop exponentially as their numbers grow, and so they will get dirt cheap. I have seen people refer to the cost of integrated circuits having dropped so much over the last few decades as proof. Not so.

They are committing the sin of exponentialism in an obviously dumb way. As I explained above the first integrated circuits were far from working at the limits of physics of representing information. But today’s robots use mechanical components and motors that are not too far at all from physics based limits, about mass, force, and energy. You can’t just halve the size of a motor and have a robot lift the same sized payload. Perhaps you can halve it once to get rid of inefficiencies in current designs. Perhaps. But you certainly can’t do it twice. Physical robots are not ripe for exponential cost reduction by burning wastes in current designs. And it won’t happen just because we start (perhaps) mass producing humanoid robots (oh, but the way, I already did this a decade ago–see my parting shot below). We know that from a century of mass producing automobiles. They did not get exponentially cheaper, except in the computing systems. Engines still have mass and still need the same amount of energy to accelerate good old fashioned mass.

This Year’s Prediction Update

There is only one new comment in my robotics, AI and ML predictions table this year. There are a bunch of well funded new companies in the home robot space, and perhaps they will come up with new mobility solutions, which in my experience is the big blocker for home robots.

Prediction
[AI and ML]
Date2018 CommentsUpdates
Academic rumblings about the limits of Deep Learning
BY 2017
Oh, this is already happening... the pace will pick up.
The technical press starts reporting about limits of Deep Learning, and limits of reinforcement learning of game play.
BY 2018
The popular press starts having stories that the era of Deep Learning is over.
BY 2020
VCs figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".
NET 2021
I am being a little cynical here, and of course there will be no way to know when things change exactly.
Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.
NET 2023
BY 2027
Whatever this turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.20240101
It definitely showed up in 2023. It was in the public mind in December 2022, but was not yet the big thing that it became during 2023. A year ago I thought it would perhaps be neuro-symbolic AI, but clearly it is LLMs, and ChatGPT and its cousins. And, as I predicted in 2018 it was something already being worked on as the "attention is all you need" paper, the key set of ideas, was published in 2017.
The press, and researchers, generally mature beyond the so-called "Turing Test" and Asimov's three laws as valid measures of progress in AI and ML.
NET 2022
I wish, I really wish.20230101
The Turing Test was missing from all the breathless press coverage of ChatGPT and friends in 2022. Their performance, though not consistent, pushes way past the old comparisons.
20240101
The Turing Test was largely missing from the press in 2024 also, and there was a story in Nature commenting on that. So yes, this has now happened.
Dexterous robot hands generally available.NET 2030
BY 2040 (I hope!)
Despite some impressive lab demonstrations we have not actually seen any improvement in widely deployed robotic hands or end effectors in the last 40 years.
A robot that can navigate around just about any US home, with its steps, its clutter, its narrow pathways between furniture, etc.Lab demo: NET 2026
Expensive product: NET 2030
Affordable product: NET 2035
What is easy for humans is still very, very hard for robots. 20250101
A bunch of startups in the home robot space got significant funding in 2024. Two of them are run by ex-CEOs of large companies: iRobot and Cruise (and he was also an intern at iRobot after we were already a public company). So this one may be in play for a lab demo in the next few years if they have this as one of their goals..
A robot that can provide physical assistance to the elderly over multiple tasks (e.g., getting into and out of bed, washing, using the toilet, etc.) rather than just a point solution.NET 2028There may be point solution robots before that. But soon the houses of the elderly will be cluttered with too many robots.
A robot that can carry out the last 10 yards of delivery, getting from a vehicle into a house and putting the package inside the front door.Lab demo: NET 2025
Deployed systems: NET 2028
A conversational agent that both carries long term context, and does not easily fall into recognizable and repeated patterns.
Lab demo: NET 2023
Deployed systems: 2025
Deployment platforms already exist (e.g., Google Home and Amazon Echo) so it will be a fast track from lab demo to wide spread deployment.20240101
One half of this happened this year. ChatGPT has been connected to microphones and speakers so you can now talk to it. and It does not fall into recognizable patterns. BUT the other half is the half it does not have; it has no updatable memory apart from its token buffer of what it has just said. Long term context may be long term in coming.
An AI system with an ongoing existence (no day is the repeat of another day as it currently is for all AI systems) at the level of a mouse.NET 2030I will need a whole new blog post to explain this...
A robot that seems as intelligent, as attentive, and as faithful, as a dog.NET 2048This is so much harder than most people imagine it to be--many think we are already there; I say we are not at all there.
A robot that has any real idea about its own existence, or the existence of humans in the way that a six year old understands humans.NIML

A Parting Shot

I recently read a research paper on humanoid robots working in built for human environments. It was based on the argument that the best form for a robot that is to operate in human environments is something tallish and skinny-ish, and probably dynamically balancing, with arms that can reach down to table tops etc., and with a sensor system that can look down from above, as that is what our human environments are optimized for. Here is the first paragraph of the paper:

The past decade has seen an explosion of research in humanoid robotics. The stated motivations for this work have varied widely. Many teams have concentrated on bipedal locomotion, some have been interested in human level social interactions, understanding human intelligence, modeling human learning capabilities and others have been more interested in entertainment. Some humanoid robots have had manipulation capabilities on static humanoid platforms and some of that work is aimed at dexterity, plus there has been simple two armed grasping on mobile humanoid platforms. Overall there has been very little work combining dexterous manipulation with humanoid robots, static or mobile–much of that which has appeared, has been concerned with dynamic tasks like pole balancing and juggling rather than manipulation, or has used teleoperated manipulation.

Apart from the weird references to pole balancing and juggling this all sounds pretty reasonable and consistent with what is happening today, and with recent history.  In fact this is the very first paragraph of the very first paper in the very first issue of the very first volume of the International Journal of Humanoid Robotics.

And it was published in 2004, with me as first author.  Let me spell that out in case you thought there was a typo in the year. This is from a paper that I and my students and post-docs wrote in the year two thousand and four. Here is the beginning of the contents page for that first issue.

You can download the text of that paper here. The journal is now in its 21st year of operation, an on its 21st volume of issues and papers.

By the time this paper was written my research group at MIT had been working on and building humanoid robots for twelve years. This paper, about a robot named Cardea, was probably our sixth or seventh humanoid robot. [[In 2008 I started a company that built and shipped thousands of humanoid robots. The picture at the top of this post was taken in China with a line up of humanoids that we had built in Massachusetts and New Hampshire and sold to people in China (before a US initiated trade war with China put an end to it in 2018…irony can be personally hard to take at times…).]]

The robot Cardea (Cardea was an ancient Roman goddess of door hinges and handles; these are still a challenge for modern robots…) was a two wheeled dynamically balancing robot  that lived in a built-for-humans office environment. Cardea was able to open doors using existing door handles and then make its way through doors it had opened.

Pro tip: Just because you heard about a new idea this last year or two doesn’t mean that people haven’t been working on that very same idea for decades. So temper your expectations that it must be about to transform the world. Ideas that transform the world take decades, or centuries of development, and plenty of people long before you have been just as excited about the idea and had thought it was on the verge of taking off. And none of us, including you and me, are likely to be special enough or lucky enough to come along at just the right time to see it all happen.

Like all modern humanoid robots Cardea did not walk in a way that used passive dynamics to store energy, and basically modulate the behavior of a passive mechanism that had only low energy input, which is how all animals walk. So, like all modern mobile humanoid robots (and legged robots in general) when things were going awry its control algorithms tried to recover by pumping in large amounts of energy very quickly and sometimes that didn’t quite work and the energy needed to go somewhere.

Cardea could be a little dangerous in those circumstances, if it fell on you having just increased its kinetic energy. Even the spring based deployment system for its stick-like legs that were engaged when it realized it was going to fall could be dangerous.

This is still a problem with all modern humanoid robots. That is why the tele-operated humanoids that were in the Tesla movie lot theater show a couple of months ago operated in two modes. When they all walked out the human guests were kept away from them. Once they stopped walking and were operating in a very different mode people were allowed to approach them, and then get fooled into thinking they were talking to an AI powered robot when they were really talking to a remote human operator. But the robot was no longer moving its feet, and no longer a source of physical danger as a result.

Another pro tip: Don’t stand anywhere near a walking or balancing wheeled humanoid when they are moving or doing any task. I have had some near misses for myself with my own humanoids twenty years ago and more recently with some of the humanoids from new start ups. And more generally never be below any sort of walking robot, no matter how many legs it has, when it is walking up stairs.

HUMAN SpaceFLIGHT

The numbers of flights in 2024 was not much different from those in 2023 (I neglected to include the flights by China last year).  It does not feel like a golden age of human spaceflight, though there were other highlights from SpaceX.

Orbital Crewed Flights

Three countries put 28 people into orbit in 2024, the United States launched 16 people on five flights and Russia and China launched 6 people each with two launches. So there were nine crewed orbital flights total. Two were private and seven were government flights.

The United States: There were four US flights to the International Space Station, starting with the private Axion-3 mission with a crew of four on January 18th. The launch vehicle for this was a SpaceX Falcon 9, and the crew vehicle was a SpaceX Dragon. The remaining US flights to the ISS were paid for by NASA. Two of them were SpaceX flights, with four people on March 4th, the Crew-8 mission, and two people on board Crew-9 on October 25th. The remaining US flight to the ISS was the inaugural crewed flight of Boeing’s Starliner, launched on June 5th atop an Atlas V rocket with two people aboard. They are still stuck in space and will be for a few more months–see the section on Boeing below.

The other US mission was also a SpaceX launch and vehicle flight, this time known as Polaris Dawn. It was the second mission paid for by billionaire Jared Isaacman, with him as commander. There was a former US Air Force fighter pilot as mission pilot and two SpaceX employees as mission specialists, giving a total crew size of four. They stayed aloft for five days, launching on September 10th, This mission flew higher above Earth than any mission since Apollo 17, the last lunar landing mission, in 1972. Two of the crew “spacewalked” with their feet inside the Dragon capsule but with their bodies outside. This was the first private spacewalk ever. Now Isaacman has been tapped by the incoming US President to be the administrator of NASA.

Russia: There were two Soyuz launches, each with three people, up and down, but different people coming back. The launch dates were March 23rd and September 11the. The six people that launched on Soyuz in 2024 were 3 Russian Cosmonauts 2 NASA Astronauts and one Belarusian commercial airline flight attendant who won a national competition with 3,000 applications. She was the only one not set for a long duration mission and was off the ground for slightly less than 14 days. So there were no space tourists per so, but the Belarusian flyer was most likely included as part of Russia’s efforts to keep in good favor with Belarus which has aided it in its war in Ukraine, and was certainly not part of the regular scientific program of the ISS.

China: There were two flights of  Shenzhou (a larger more modern version of Soyuz) that were crewed in 2024.  Both flights were to the Tiangong Space Station and both took along three Taikonauts, first on April 25th and then on October 9th.  Both crews were assigned long duration missions and now the crews are overlapping previous crews at Tiangong so it is now being continuously occupied. The first handover this year took about five days and the second about three and a half weeks.  Both times there were six Taikonauts onboard Tiangong at the same time.

Suborbital Crewed Flights

There have been two companies providing space tourism flights on suborbital flights. Blue Origin launches a capsule on top of a reusable rocket, New Shepard, and the capsule lands using a parachute and a brief rocket blast right before hitting the ground (similar to how Soyuz lands). Virgin Galactic has a winged craft which is carried aloft by a bigger a jet engined airplane, it separates at high altitude within the atmosphere and rockets into space. It flies back and lands on a runway.

Both companies are run by billionaires who made their money in other businesses.  Both billionaires have flown to space on their own craft.

Both companies have aimed to have regular launches with lots of tourists, but neither has gotten to that scale and so far only a very small number of the many people who have paid a substantial deposit have been able to fly.

Blue Origin had a failure with an uncrewed version of the vehicle in 2022 and only flew one flight in 2023 which was also uncrewed. This year they flew three crewed flights on May 19th, August 29th, and November 22nd, each with six passengers (the system is automated and requires no pilots). In 2021 and 2022 they also had three flights, so there has now been nine crewed flights total. The first two took four passengers and the remaining seven have had six passengers, so altogether they have flown 50 people above the Karman line, 100 kilometers above Earth.  This is not yet a regular cadence, nor a large scale tourist business.

In 2024 Virgin Galactic had two flights, each with two crew from the company and four passengers. These flights were on January 26th and June 8th. Virgin Galactic flights are now on hiatus, awaiting a new bigger and better vehicle in about two years.  Virgin Galactic has had a total of twelve flights since December 13th in 2018.  Three have had two people on board and nine have had six people on board, for a total of sixty filled seats that have crossed the Karman line. The total number of different people is smaller as the two pilot seats on each flight have been occupied by a small number of people who have flown multiple times.

So, in 2024 thirty people went on suborbital flights, and altogether there have been 110 people on these commercial suborbital flights. Space tourism on suborbital flights has yet to take off in a regular or scaled way.

Prediction
[Space]
Date2018 CommentsUpdates
Next launch of people (test pilots/engineers) on a sub-orbital flight by a private company.
BY 2018
A few handfuls of customers, paying for those flights.
NET 2020
A regular sub weekly cadence of such flights.
NET 2022
BY 2026
20240101
There were four flights in 2021, three in 2022, and seven, five with customers on board, in 2023--all of them by Virgin Glactic. Blue Origin did not fly in 2023. At this point 2026 is looking doubtful for regular flights every week.
20250101
Now 2026 is looking impossible given the data from 2023 and 2024, and one of the two companies being on hiatus for all of 2025, and well into 2026.
Regular paying customer orbital flights.NET 2027Russia offered paid flights to the ISS, but there were only 8 such flights (7 different tourists). They are now suspended indefinitely. 20240101
There were three paid flights in 2021, and one each in 2022, and 2023, with the latter being the Axiom 2 mission using SpaceX hardware. So not regular yet, and certainly not common.
20250101
There were two paid flights in 2024.
Next launch of people into orbit on a US booster.
NET 2019
BY 2021
BY 2022 (2 different companies)

Current schedule says 2018.20240101
Both SpaceX and Boeing were scheduled to have crewed flights in 2018. SpaceX pulled it off in 2020, Boeing's Starliner did not fly at all in 2023, but is scheduled to launch with people onboard for the first time in April 2024.
20250101
The second company did finally launch humans into orbit in June 2024, so it has happened three years later than I predicted and six years later than what had been promised when my prediction was made. Of course, everyone implicitly assumed that along with getting humans into space the companies would also be able to bring them back. Not so for Boeing.
Two paying customers go on a loop around the Moon, launch on Falcon Heavy.
NET 2020
The most recent prediction has been 4th quarter 2018. That is not going to happen.20240101
Starship launched twice in 2023 but didn't get to orbit either time. This is going to be well over six years later than the original prediction by the CEO of SpaceX.
20250101
The billionaire who signed up for this and paid a hefty deposit in 2017 gave up waiting and cancelled the contract in 2024. This fantasy is over, for now at least.
Land cargo on Mars for humans to use at a later date
NET 2026
SpaceX has said by 2022. I think 2026 is optimistic but it might be pushed to happen as a statement that it can be done, rather than for an pressing practical reason.20240101
I was way too optimistic, and bought into the overoptimistic hype of the CEO of SpaceX even though I added four years, doubling his estimated time frame.
20250101
I can now call this as orbital mechanics and Hohmann transfer windows dictate that the cargo would need to have been launched a few months ago for it to get to Mars in 2025. It has not been launched.
Humans on Mars make use of cargo previously landed there.NET 2032Sorry, it is just going to take longer than every one expects.
First "permanent" human colony on Mars.NET 2036It will be magical for the human race if this happens by then. It will truly inspire us all.
Point to point transport on Earth in an hour or so (using a BF rocket).NIMLThis will not happen without some major new breakthrough of which we currently have no inkling.
Regular service of Hyperloop between two cities.
NIML
I can't help but be reminded of when Chuck Yeager described the Mercury program as "Spam in a can".
20240101
Calling this one 26 years early. As of today no-one is still working on this in an operating company.

Boeing’s Starliner

First announced in 2010 Boeing’s Starliner was originally scheduled to fly a human crew in 2018. It carried out its second uncrewed flight in May 2022, and finally did make its first crewed flight on June 5th. The crew of two docked with the ISS, but there were problems with multiple gas thrusters for fine motion during the docking. The original plan was that the crew would stay on the ISS for about a week and then return to Earth for a touchdown on to hard soil (as all Russian and Chinese crewed missions end along with all Blue Origin sub-orbital flights).

The option of that return was considered, but the thrusters were on a section of the vehicle which is discarded along the way before the landing so there was no possibility of getting a look at the hardware back on Earth.  So a program of tests while docked to the ISS was started delaying the crew return.

Eventually it was decided that it was too risky for the crew to return on the craft and so it returned empty on  September 7th, landing in New Mexico. As it happened, although there were more anomalies with the thrusters the crew would have landed safely had they been on board.

Now the crew was stranded in space with no designated ride home. It was decided to remove two crew from the Crew-9 launch and have the Starliner astronauts, Barry Wilmore and Sunita Williams, fly back on that SpaceX Dragon with the other two, which after additional delays is now scheduled to happen some time in March 2025. Their one week visit to the ISS will have stretched out to nine months by then.

Boeing has committed to fixing the problems with Starliner. The boosters that it uses are no longer being built, but there are five existing ones reserved for the five additional contracted flights that Boeing has with NASA. They are supposed to happen once per year.

We do not know at this point, but I think it would not be a huge surprise if Starliner never flies again.

SpaceX Falcon 9 

Once again the Falcon 9 launch system has broken all sorts of records for number of launches and reuse.

During 2024 there were 132 single booster launches.  For two of those flights no attempt was made to recover the first stage (there is a performance penalty for the primary payload in order to recover the first stage). One attempted recovery failed when the booster (on its 23rd flight) caught fire as it landed on the recovery barge. Another booster has since flown a total of 24 times.

In terms of mission success all but one of these flights succeeded; one failed when the second stage failed during re-ignition for adjusting the orbit.

There were also two Falcon Heavy, the three booster version, launches, both of which succeeded. One of the had successful landings for the two side boosters, but there was no attempt to recoer the central booster on that flight and no attempt to recover any of the three boosters on the other Heavy flight.

This brings the total number of launches of the single booster version to 417 along with 11 launches of the three booster Heavy version.  These numbers are way beyond the number of launches for any other orbital booster.  Additionally it is the only flying orbital system that is reusable at the moment, though  Blue Origin and Rocket Lab both plan on joining the club soon.

It is worth, once again, looking at how long it has taken to get to a total (across both single booster and Heavy triple booster versions) of 428 launches, with only three failures to deliver the payload to where it was intended to go.

The first launch occured in June 2010, and there were a total of 4 launches in the first three years.  The first successful booster recover happened on the 20th flight, in December 2015, five and a half years in. The first reuse of a booster occured in 2017, in the 8th year of the program.

Since 2021 there has been a steady increase in the number of launches per year,

Year# of launches
20102
20110
20122
20133
20146
20157
20168
201718
201821
201913
202026
202131
202261
202396
2024134

SpaceX had previously gotten satellites to orbit with its first rocket, the Falcon 1.  Falcon 9 has been a spectacular success.  But it was not instantaneous.  It took time to build from the cadence of launches, about 10 years before the hockey stick curve showed up.  Deployment is never sudden but comes after a long build.

SpaceX Starship

Starship is SpaceX’s superheavy two stage rocket, designed to put 150 tons of payload into orbit, but also be able to go to the Moon or Mars. There is the booster which is designed only to work in Earth atmosphere with 33 Raptor engines both to get the second stage high enough and fast enough and to let the first stage have a controlled return to the launch site. The second stage, called Starship, is both a booster and the payload.  It has three Raptor engines and three Raptor vacuum engines. The Raptor engines are designed to get the Starship into orbit after the first stage drops away, and to guide the Starship as it returns to its Earth launch site. The Raptor vacuum engines are meant for breaking out of Earth orbit and going to the Moon or Mars, and to do soft landings on those two bodies where there is no or almost no atmosphere.

In 2024 SpaceX made steady progress with four launches of the two stages coupled together.  The first two launches lead to both stages blowing up.

The third and fourth launches were a big improvement.  As with earlier flights they launched from the coast of Texas. In both cases the second stage did a reentry burn on it first orbit and then did a soft landing in a target zone in the Indian Ocean.  In the third flight the main booster returned to the launch site and hovered next to the launch tower betweeen two giant arms which then captured it and the engines shot down successfully. It was sifficiently damaged during flight however, that it was not reusable. In the fourth flight there were health anomalies to the first stage was ditched in the Gulf of Mexico.

On the fourth flight there was both less heat shielding and much less damage from heat during reentry. This is definite forward progress. But it is still quite a long way from both being operational and both stages being reusable. And it is even further away from being human rated.

This is the vehicle that the CEO of SpaceX recently said would be launched to Mars and attempt a soft landing there.  He also said that if successful the humans would fly to Mars on it in 2030. These are enormously ambitious goals just from a maturity of technology standpoint. The real show stopper however may be human physiology as evidence accumulates that humans would not survive three years (the minimum duration of a Mars mission, due to orbital mechanics) in space with current shielding practices and current lack of gravity on board designs. Those two challenges may take decades, or even centuries to overcome (recall that Leonardo Da Vinci had designs for flying machines that took centuries to be developed…).

The President of SpaceX may be taking a leaf out of the CEO’s always overly optimistic predictions. In November she said “I would not be surprised if we fly 400 Starship launches in the next four years”. Looking at the success of Falcon 9 it is certainly plausible that I may live to see 400 Starship launches in a four year period, but I am quite confident that it will not happen in the next four years (2025 through 2028).

One more thing. Back when I first made the predictions there had been an announcement by the CEO of SpaceX that in 2018 the company was under contract to send a very rich paying customer in a trip around the moon in 2018, launched on a Falcon Heavy. I was completely skeptical. Over the years the date got pushed back and pushed back, and the proposed flight vehicle was changed to be Starship. As we all know the flight of the Japanese billionaire around the Moon still hasn’t happened. In 2024 Yusaku Maezawa finally gave up waiting and cancelled the contract.

NASA Artemis

NASA’s plan is that the second Artemis mission, using the Orion Capsule, Artemis II, will fly to the Moon with four people aboard, the first crewed Artemis flight. An uncrewed flight of Orion around the Moon flew in 2022.  The crewed flight was scheduled to launch in May 2024, but it was first delayed by six months and then a little more and in the last year it has slipped another full year. It is now scheduled to fly in April 2026.

Artemis III was scheduled to launch in 2025 with a return to the surface of the Moon. However that relied on using a Starship (itself refueled in LEO by 14 (yes, fourteen!!) other Starship launches) to land there.  No one any longer believes that schedule, and willlikely delay a few years, given where Starship is in its development and current capability.  The officieal schedule says mid 2027, but that seems unlikely.

You can find the architecture of the Artemis III mission at this website.

Blue Origin Orbital BE-4 Engines and New Glenn

The suborbital tourist flights that Blue Origin operates are not its main business. It has ambitions to compete head to head with SpaceX. Another billionaire vs billionaire competition.

It has developed the BE-4 engine designed to fly 100 times, and to power the first stage of its massive New Glenn rocket (see below).  But in the meantime it has started selling the BE-4 to ULA (United Launch Alliance) to power their Vulcan Centaur heavy launch vehicle. It’s first stage uses two BE-4 engines, along with a variable number of solid fuel strap ons.

Vulcan Centaur flew two times in 2024 and the BE-4 engines worked perfectly both times, on January 8th and again on October 4th. This is a solid validation of the engine’s capabilities.

Blue Origin’s own first orbital class rocket, New Glenn, is massive, and comparable to the Flacon Heavy (three boosters) rather than the Falcon 9 in capability.   It has been in development for a long time, but saw its first visits to a launch pad, fully stacked in 2024. The first stage uses seven BE-4 engines, and is intended to land on a barge and be fully reusable. The second stage uses two BE-3U engines, a variant of the single engine used on their New Shepard sub-orbital space tourism vehicle. There is a project underway to make a fully reusable version of the second stage.

Launch seems imminent.  Here it is at the launch pad in November 2024.

On Friday December 27th, 2024, it was fully fueled in both stages and went through a countdown and fired its seven BE-4 engines for 24 seconds. Now it will leave the pad to have its payload installed. The launch could be as early January 6th.  The very first launch will be an all up affair, attempting to get something to orbit and land the booster on its first flight. This is a very different development approach to that used by SpaceX.

Let’s Continue a Noble Tradition!

The billionaire founders of both Virgin Galactic and Blue Origin had faith in the systems they had created. They both personally flew on the first operational flights of their sub-orbital launch systems. They went way beyond simply talking about how great their technology was, they believed in it, and flew in it.

Let’s hope this tradition continues. Let’s hope the billionaire founder/CEO of SpaceX will be onboard the first crewed flight of Starship to Mars, and that it happens sooner than I expect. We can all cheer for that.

Tips For Building and Deploying Robots

rodneybrooks.com/tips-for-building-and-deploying-robots/

This post is not about research or developing software for robots. Instead it is some tips on how to go about building robots for mass deployment and how to leverage those deployed robots for improving your product.

The four tips are straightforward but I explain them more below.

  1. Use other people’s supply chain scale wherever possible.
  2. Changing infrastructure up front eases robot success but kills ROI for customers and their ease of buying decisions.
  3. New infrastructure already developed for people is often good infrastructure for robots.
  4. Deployed robots can collect real world data.
1. Other People’s Supply Chain Scale

When you start deploying a robot it is unlikely to be initially in large numbers unless it is a very low price.  When we started selling Roombas at iRobot our first manufacturing batch was 70,000 robots, which is enormous by most robot deployment standards. We got to millions of robots per year. But these numbers are unusual for robots. Even so we got supply chain economies of scale at the millions we did not have at the tens of thousands.

How are you going to get economies of scale when you are shipping just a couple of robots per month, or even tens per week? And really how big an influence can you have over suppliers even when the volume is enormous, for you, at a small number of hundreds of robots per week?

My advice is to find someone else’s big juicy supply chain and grab a hold of that for yourself. Juicy in this context means three things. (1) Supply chains with significant downward price pressures. (2) Supply chains with enormous volumes. (3) Supply chains with many different suppliers, supplying standard parts, and all competing, and operating, in multiple geographies.

For example, you can use exactly the same motors that someone else is using in the millions per year, preferably a standard, and preferably ones that are built by multiple suppliers. Then buy those same motors from one or more of those standard suppliers, though first make sure they have adequate quality control and and quality guarantees. Ten years ago it made sense to get custom windings for motors in Eastern Europe. Now you should try to use existing motors from large scale consumer products.

Do the same for battery packs, or at least battery cells. Fifteen years ago it made sense to have unique battery chemistries both for robot performance and for the regulatory requirements at the geographical point of deployment. Now there are so many at scale different battery cells and chemistries available for at scale consumer products (including electric vehicles) it makes sense to pick the closest fit and use those in your robot.

Do the same for every other part of your robot that you can.

Doesn’t this make designing the production robot harder, getting standard parts rather than ones optimized for your robot?  Yes, and no.  It makes design a slightly different process, but one that rewards you handsomely in terms of lower BOM (Bill Of Materials) cost, and in stability of supply. Those of us (which is just about everyone) who went through the turbulence of COVID-era supply chains can testify to how that turbulence can kill the stability of your careful designs.

Oh, and by the way, you are already, to at least a limited extent, practicing this type of design. For your fasteners no manufacturing engineer is going to let you get away with deciding on an M5.3 bolt [an M5 bolt is 5mm, and an M6 bolt is 6mm, and they are standard sizes, with no standard in between], with a custom head, and a custom pitch, and a custom length of 10.27mm. No, they are going to insist that you use an M5 or an M6 bolt, with one of the many standard heads and pitches and one of the many standard lengths.  And the reasons for that are precisely the same as the reasons I gave above for motors and battery cells, and processor boards, connectors, etc. There is an enormous supply chain for standard sized bolts, so they are way cheaper than custom sized bolts, by two or three orders of magnitude.

2. Changing Infrastructure

I have a rule when designing robots that are going to be deployed to real customers. No one in the company is allowed to say “the customer/end-user can just …”. If we are asking a customer or end-user to do something they wouldn’t naturally do already we are making it harder for them to use our product, compared to what they are already doing, or making it more expensive for them to install our product. This is true of both user interfaces and infrastructure.

As a business you need to decide whether you are selling infrastructure, or selling something that goes into an existing environment. A robot that can move about is a powerful thing to add to an existing environment because it provides mobility and mobile sensors without, in general, having to change the infrastructure.

The Roomba was an easy sale, not just for its price, but because it did not demand that people had to install some infrastructure in their home environment before they could have a Roomba clean their floors. All that a Roomba needed was a standard electrical outlet into which a human could plug its charger. Nothing else.

Since we have been working at Robust AI on a new class of autonomous mobile robots for warehouses and factories, many people (mostly outside of the company, and not potential customers) have suggested all sorts of infrastructure changes that would make it easier to deploy our robots. These have included rubber floors, floors with a wireless recharging system underneath them so the robots can stay charged at all times, radio or laser beacons for localization of the robot, QR codes on every pillar, also for localization, structured lighting systems for the whole interior of the building, separate floors for human driven forklifts and the robots, etc., etc.

All these things would be additional expense for deploying the robots. It would have three bad impacts. (1) There would be a pre-phase for deployment where the infrastructure would have to be built into the existing facility, causing delay and in some cases downtime for the existing facility. (2) Lower ROI (Return On Investment) for the customer as one way or another they would end up paying for the change in infrastructure. (3) A much harder buying decision for the customer both because of a longer deployment delay and because of additional cost.

Your robot business will be much simpler if you don’t need changes in infrastructure.

3. But There Is Good News On Infrastructure

As technology advances we humans build new infrastructure for us. Often that same infrastructure turns out to be useful for robots.

When people first had autonomous vehicles driving on freeways (back in the 1980’s!! — see the work of Ernst Dickmanns) there was not much external infrastructure, besides white lines painted on the roads, to help the vehicles. But since then a lot has changed, with new infrastructure that has been installed to help human drivers. We now have GPS, and digital maps, so that our human driven cars know exactly where they are on those maps, and can display it to us, and search the maps for recommended routes. In addition there is congestion information that gets sent to our human driven cars and our phones, which let our route planners suggest faster routes. And the systems in some cars know the speed limits everywhere without having to read the speed limit signs. All these capabilities make it much easier for robot cars, autonomous vehicles, to be able to drive around. That human assist infrastructure has been a real boon towards getting to autonomous vehicles.

In the early days of robots for hospitals and for hospitality (i.e., hotels) and for managed care facilities for the elderly, it was first necessary to install Wi-Fi in the buildings as it was needed for the robots to know what to do next. That was a big barrier to entry for those robots.  Now, however, the humans in those environments expect to have Wi-Fi available everywhere. It started to be deployed in hospitals to support the handheld devices supplied to doctors and nurses so they could record notes and data, but now patients and their visitors expect it too. The same with hotel guests. So requiring Wi-Fi for robots is no longer that much of a barrier to adoption.

In warehouses and factories the hand held bar code scanners have forced the introduction of Wi-Fi there. And in order to make it easy for humans to push around carts, as they do in both those environments, the floors are now generally trash free, and the slopes and lips in the floors are all quite small–infrastructure introduced for the benefit of humans. But this infrastructure, along with plenty of lighting for humans so they can see, make it much easier to introduce robots than it would have been a few decades ago, even if we had had robots back then.

Look for human driven infrastructure changes in whatever domain you want to have your robots work, and see if the robots can make use of that infrastructure. It will likely make them better.

4. deployed Robots Live In A Sea Of Real World Data

The last few years have seen a clear rise in the utility of machine learning. (Just for reference the term “machine learning” first appeared in 1959 in a paper on checkers written by Arthur Samuel; I wrote a really quite bad master’s thesis on machine learning back in 1977.) The web has provided massive amounts of text for machine learning which has given rise to LLMs. It has also supplied images and movies but they, and other images and movies explicitly collected for the purpose, have required human labeling. Recently people have had sensors attached to them and have been asked to do physical tasks to provide data for research projects (they are “research” projects even if done in well funded companies, as this data has not yet led to deployed robots) aiming to use machine learned control systems for robots (it is not clear to me that the type of data currently collected, nor the assumptions about the form of those control systems, will lead to useful results).

If you have deployed robots then data collection with labels may be available to you pretty much for free.

If your robots have cameras on them, then once they are deployed they are able to observe lots of real world things happening around them. If they are doing real work, then those video feeds are synchronized to what is happening in the world. and to external data interactions that the robot has with the work management system installed in the workplace. Thus the visual data is pre-labelled in some way, and can be used to feed learning algorithms with vast quantities of data over time.

Here is an example of some collected data that is used to feed learning algorithms at Robust AI.

Get your robots deployed then have them learn from real world data to improve and extend their performance.

Rodney Brooks’ Three Laws of Artificial Intelligence

rodneybrooks.com/rodney-brooks-three-laws-of-artificial-intelligence/

I have recently blogged about my three laws of robotics. Here I talk about my three laws of Artificial Intelligence, about how people perceive AI systems, about how they operate in the world and how difficult it is to make them general purpose in any sense.

  1. When an AI system performs a task, human observers immediately estimate its general competence in areas that seem related. Usually that estimate is wildly overinflated.
  2. Most successful AI deployments have a human somewhere in the loop (perhaps the person they are helping) and their intelligence smooths the edges.
  3. Without carefully boxing in how an AI system is deployed there is always a long tail of special cases that take decades to discover and fix. Paradoxically all those fixes are AI-complete themselves.

I very briefly elaborate on these three laws.

Over Estimating Competence

I talked about  the tendency people have to extrapolate general competence in some area from an observation of a more specific performance in my blog post titled The Seven Deadly Sins of Predicting the Future of AI:

People hear that some robot or some AI system has performed some task. They then take the generalization from that performance to a general competence that a person performing that same task could be expected to have. And they apply that generalization to the robot or AI system.

The competence of AI systems tends to be very narrow and we humans don’t have the right model for estimating that competence. The overhyping of LLMs in the last two years is a case in point.  LLMs cannot reason at all, but otherwise smart people are desperate to claim that they can reason.  No, you are wrong. LLMs do not reason, by any reasonable definition of reason. They are not doing what humans are doing when we say they are reasoning, and applying that word to LLMs saying it is “reasoning but different” simply leads to gross failures in predicting how well the technology will work.

Person In the loop

People get further confused by the fact that there is usually a person in the loop with deployed AI systems. This can happen in two different ways:

  1. The person who is using the system bears some of the responsibility for what the system does and consciously, or unconsciously, corrects for it. In the case of search engines, for instance, the AI system offers a number of search results and the person down selects and does the final filtering on the results. The search engine does not have to be as intelligent as if it was going to apply the results of its search directly to the real world. In the case of Roombas there is a handle on the robot vacuum cleaner and the owner steps in and gets the robot out of trouble by picking it up and moving it.
  2. The company that is deploying the AI system is operating on a “fake it until you make it” strategy, and there is a person involved somewhere but that fact is deliberately obfuscated.  It turns out that both of the large scale deployments of autonomous vehicles in San Francisco had and have remote people ready to invisibly help the cars get out of trouble, and they are doing so every couple of minutes. A major online seller recently shut down its scanless supermarkets where a customer could walk in, pick up whatever they wanted and walk out and have their credit card charged the correct amount. Every customer was consuming an hour or more of remote humans (in India as it happened) watching and rewatching videos of the customer to determine what it was they had put in their shopping baskets. Likewise for all the campus delivery robot systems–there the companies are selling Universities the appearance of being at the forefront of technology adoption, and not actually providing a robot service at all.
The Long Tail

Claude Shannon first outlined how a computer might be programmed to play chess in an article in Scientific American back in February 1950. He even suggested some mechanisms that might be used to make it learn to play better. In 1959 Arthur Samuel described an implemented program that worked that way and used those learning methods to play the game of Checkers. It was the first time the phrase “machine learning” was used in print. His Checkers program went on to become quite good and to play at an expert human level, even with the tiny computational power of the early 1960s.

But Chess turned out to be much harder. Decades of work went into both improving the learning capabilities of chess and making the look ahead search of possible moves push out further and further. Moore’s Law was the winner and by the 1990s a program had beaten Garry Kasparov, the world champion. There was a lot of controversy over whether the program, Deep Blue from IBM, was a general purpose chess program or a dedicated Kasparov style play beater. By the early 2000s the doubts were gone. Chess programs had gotten good enough to beat any human player.

Chess programs got rid of having to deal individually with special cases through brute force. But Chess is a perfect information game, with no uncertainties and no surprises on the possible pieces that can be on a chess board. It is a closed world.

The places where we really want AI to work well or in more general open worlds.  Our roads where cars drive are an open world. There can be all sorts of things that happen infrequently but need to be handled when they do happen even if the circumstances are different from every thing experienced by AI driving systems before. There are tornados, blizzards, hurricanes, wind borne tumble weeds, and plastic bags, and sheets of styrofoam, and a million other things we could enumerate. Humans use general capabilities, non-driving capabilities, to figure out what to do in unexpected circumstances when driving. Unless an AI system has such general capabilities, every one of the million “long tail events” needs to be trained for.

The things we are asking, or believing the hype about, our AI systems to do have lots and lots of special cases that are not subject to the simple brute force of increasing compute power (despite the hucksters claiming precisely that — just give me $10 billion dollars more of cloud services and our system will learn it all).

No, instead we need to put boxes around our AI systems and products and control where they are applied. The alternative is to have brittle systems that will end up causing economic and perhaps safety upheavals.

Rodney Brooks’ Three Laws of Robotics

rodneybrooks.com/rodney-brooks-three-laws-of-robotics/

Here are some of the things I’ve learned about robotics after working in the field for almost five decades.  In honor of Isaac Asimov and Arthur C. Clarke, my two boyhood go-to science fiction writers, I’m calling them my three laws of robotics.

  1. The visual appearance of a robot makes a promise about what it can do and how smart it is. It needs to deliver or slightly over deliver on that promise or it will not be accepted.
  2. When robots and people coexist in the same spaces, the robots must not take away from people’s agency, particularly when the robots are failing, as inevitably they will at times.
  3. Technologies for robots need 10+ years of steady improvement beyond lab demos of the target tasks to mature to low cost and to have their limitations characterized well enough that they can deliver 99.9% of the time. Every 10 more years gets another 9 in reliability.

Below I explain each of these laws in more detail.  But in a related post here are my three laws of Artificial Intelligence.

Note that these laws are written from the point of view of making robots work in the real world, where people pay for them, and where people want return on their investment. This is very different from demonstrating robots or robot technologies in the laboratory.

In the lab there is phalanx of graduate students eager to demonstrate their latest idea, on which they have worked very hard, to show its plausibility. Their interest is in showing that a technique or technology that they have developed is plausible and promising. They will do everything in their power to nurse the robot through the demonstration to make that point, and they will eagerly explain everything about what they have developed and what could come next.

In the real world there is just the customer, or the employee or relative of the customer. The robot has to work with no external intervention from the people who designed and built it. It needs to be a good experience for the people around it or there will not be more sales to those, and perhaps other, customers.

So these laws are not about what might, or could, be done. They are about real robots deployed in the real world. The laws are not about research demonstrations. They are about robots in everyday life.

The PROMISE GIVEN BY APPEARANCE

My various companies have produced all sorts of robots and sold them at scale. A lot of thought goes into the visual appearance of the robot when it is designed as that tells the buyer or user what to expect from it.

The Roomba, from iRobot, looks like a flat disk. It cleans floors. The disk shape was so that it could turn in place without hitting anything it wasn’t already hitting. The low profile of the disk was so that it could get under the toe kicks in kitchens and clean the floor that is overhung just a little by kitchen cabinets.  It does not look like it can go up and down stairs or even a single step up or step down in a house and it cannot. It has a handle, which make it look like it can be picked up by a person, and it can be. Unlike fictional Rosey the Robot it does not look like it could clean windows, and it cannot. It cleans floors, and that is it.

The Packbot, the remotely operable military robot, also from iRobot, looked very different indeed. It has tracked wheels, like a miniature tank, and that appearance  promises anyone who looks at it that it can go over rough terrain, and is not going to be stopped by steps or rocks or drops in terrain. When the Fukushima disaster happened, in 2011, Packbots were able to operate in the reactor buildings that had been smashed and wrecked by the tsunami, open door handles under remote control, drive up rubble covered staircases and get their cameras pointed at analog pressure and temperature gauges so that workers trying to safely secure the nuclear plant had some data about what was happening in highly radioactive areas of the plant.

The point of this first law of robotics is to warn against making a robot appear more than it actually is.  Perhaps that will get funding for your company leading investors to believe that in time the robot will be able to do all the things its physical appearance suggests it might be able to do. But it is going to disappoint customers when it cannot do the sorts of things that something with that physical appearance looks like it can do.  Glamming up a robot risks overpromising what the robot as a product can actually do.  That risks disappointing customers. And disappointed customers are not going to be an advocate for your product/robot, nor be repeat buyers.

Preserving People’s Agency

The worst thing for its acceptance by people that a robot can do in the workplace is to make their jobs or lives harder, by not letting them do what they need to do.

Robots that work in hospitals taking dirty sheets or dishes from a patient floor to where they are to be cleaned are meant to make the lives of the nurses easier. But often they do exactly the opposite. If the robots are not aware of what is happening and do not get out of the way when there is an emergency they will probably end up blocking some life saving work by the nurses–e.g., pushing a gurney with a critically ill patient on it to where they need to be for immediate treatment. That does not endear such a robot to the hospital staff. It has interfered with their main job function, a function of which the staff is proud, and what motivates them to do such work.

A lesser, but still unacceptable behavior of robots in hospitals, is to have them wait in front of elevator doors, central, and blocking for people. It makes it harder for people to do some things they need to do all the time in that environment–enter and exit elevators.

Those of us who live in San Francisco or Austin, Texas, have had first hand views of robots annoying people daily for the last few years. The robots in question have been autonomous vehicles, driving around the city with no human occupant. I see these robots every single time I leave my house, whether on foot or by car.

Some of the vehicles were notorious for blocking intersections, and there was absolutely nothing that other drivers, pedestrians, or police could do. We just had to wait until some remote operator hidden deep inside the company that deployed them decided to pay attention to the stuck vehicle and get it out of people’s way. Worse, they would wander in to the scene of a fire where there were fire trucks and fire fighters and actual buildings on fire, get confused and just stop, sometime on top of the fire hoses.

There was no way for the fire fighters to move the vehicles, nor communicate with them. This is in contrast to an automobile driven by a human driver. Fire fighters can use their normal social interactions to communicate with a  driver, and use their privileged position in society as front line responders to apply social pressure on a human driver to cooperate with them. Not so with the autonomous vehicles.

The autonomous vehicles took agency from people going about their regular business on the streets, but worse took away agency from firefighters whose role is to protect other  humans. Deployed robots that do not respect people and what they need to do will not get respect from people and the robots will end up undeployed.

Robust ROBOTS THAT WORK EVERYTIME

Making robots that work reliably in the real world is hard. In fact, making anything that works physically in the real world, and is reliable, is very hard.

For a customer to be happy with a robot it must appear to work every time it tries a task, otherwise it will frustrate the user to the point that they will question whether it make their life better or now.

But what does appear mean here? It means that the user can have the assumption that it going to work, as their default understanding of what will happen the world.

The tricky part is that robots interact with the real physical world.

Software programs interact with a well understood abstracted machine, so they tend not fail in a manner where the instructions in them do not get executed in a consistent way by the hardware on which they are running.  Those same programs may also interact with the physical world, be it a human being, a network connection, or an input device like a mouse. It is then that the programs might fail as the instructions in them are based on assumptions in the real world that are not met.

Robots are subject to forces in the real world, subject to the exact position of objects relative to them, and subject to interacting with humans who are very variable in their behavior. There are no teams of graduate students or junior engineers eager to make the robot succeed on the 8,354th attempt to do the same thing that has worked so many times before.  Getting software that adequately adapts to the uncertain changes in the world in that particular instance and that particular instant of time is where the real challenge arises in robotics.

Great looking videos are just not the same things as working for a customer every time.  Most of what we see in the news about robots is lab demonstrations. There is no data on how general the solution is, nor how many takes it took to get the video that is shown. Even worse sometimes the videos are tele-operated or sped up many times over.

I have rarely seen a new technology that is less than ten years out from a lab demo make it in to a deployed robot. It takes time to see how well the method works, and to characterize it well enough that it is unlikely to fail in a deployed robot that is working by itself in the real world. Even then there will be failures, and it takes many more years of shaking out the problem areas and building it into the robot product in a defensive way so that the failure does not happen again.

Most robot require kill buttons or estops on them so that a human can shut them down. If a customer ever feels the need to hit that button then the people who have built and sold the robot have failed. They have not made it operate well enough that the robot never gets into a state where things are going that wrong.

 

Predictions Scorecard, 2024 January 01

rodneybrooks.com/predictions-scorecard-2024-january-01/

[You can follow me on social media: @rodneyabrooks.bsky.social]

This is my sixth annual update on how my dated predictions from January 1st, 2018 concerning (1) self driving cars, (2) robotics, AI , and machine learning, and (3) human space travel, have held up. I promised then to review them at the start of the year every year until 2050 (right after my 95th birthday), thirty two years in total. The idea is to hold myself accountable for those predictions. How right or wrong was I?

The acronyms I used for predictions in my original post were as follows.

NET year means it will not happen before that year (No Earlier Than)
BY year means I predict that it will happen by that year.
NIML, Not In My Lifetime, i.e., not before 2050.

As time passes mentioned years I color then as accurate, too pessimistic, or too optimistic.

I only change the text in the fourth column of the prediction tables, to say what actually happened.  This year I have removed most of the old comments from the prediction tables to make them shorter; you can go back to last year’s update to see previous comments.  And I highlight any new dates, as in 20240103 for January 3rd, 2024.

Overview of changes this year

First, remember that a lot of good things happened in the world this year, and here are 66 of them. The world is getting better in terms of global health, clean energy, economic and social justice, and conservation. There is much more to life than LLMs.

There has been a lot of activity in both self driving cars (Cruise and Waymo human assisted deployments) and in AI (the arrival of the indisputable next big thing, ChatGPT and friends).  The human spaceflight endeavor has crawled along and largely stretched out dates that were probably too optimistic in the first place.

Self Driving Cars

There are no self driving cars deployed (despite what companies have tried to project to make it seem it has happened), and arguably the prospects for self driving taxi services being deployed at scale took a beating.

First a reminder of why I made predictions in this field.

Back in 2017 the hubris about the oncoming of self driving cars was at a similar level to the hubris in 2023 about ChatGPT being a step towards AGI (Artificial General Intelligence) being just around the corner.  Here is the same version of a slide that I showed last year:

This was a snapshot of predictions for when level 4 or level 5 self driving cars would be available from various car manufactures (e.g., Tesla or Ford), automobile subsystem providers (e.g., Continental or NVIDIA) and ride service companies (e.g., Uber).

The dates in parentheses are when the prediction on that line was made. The dates in blue are the year that was predicted for delivery. I have highlighted the dates that have already passed in pink. None of them were delivered then or now. 2023 did not appear in anyone’s prediction. Next up, 2024 for Jaguar and Land-Rover (it won’t happen…). The orange arrows are for companies that I noticed retracted their statements or pushed them out further sometime after this snap shot.  But in my original predictions on January 1st, 2018, I was reacting to these predictions, not one of which I thought would come to pass by the predicted dates.  I’m batting 17 out of 17, with only six predictions left.

(Not Really) Deployed Autonomous Ride Services, 2023

In the last year both Waymo and Cruise launched  “driverless” ride services in San Francisco, They had both previously had empty vehicles cruising the streets, and had limited availability for certain people to ride in them, for free, as though they were a ride service. Then during 2023 both companies made them available to people (on a waiting list) who signed up for an app which let you pay for rides, 24 hours per day. I took almost forty rides in Cruise vehicles under these programs. In a series of short blog posts I describe, in reverse order of writing, those experiences, what it was like in my last ride where I thought for a moment I was going to be in a terrible accident, and a little history of self driving technology.

I was by no means the only one watching their safety, and it took some good reporters to uncover some of the problems. The CEO of Cruise pushed back that the pushback against his company really wasn’t fair, and was just “anti-robot bias”. I have spent my whole professional life developing robots and my companies have built more of them than anyone else, but I can assure you that as a driver in San Francisco during the day I was getting pretty frustrated with driverless Cruise and Waymo vehicles doing stupid things that I saw and experienced every day. On my second to last commute home from my robot company office in 2023, on December 20th, an empty Waymo with lights not flashing blocked an entrance to an intersection, and I had to temporarily move into on coming traffic to continue on my way.

But things were worse. There were a number of accidents with people inside Cruise vehicles. It seemed like when a Cruise was in an intersection and a car was heading right for it, the vehicle would stop dead. I, and others, speculated that this was based on the idea that if the self driving car was not moving when there was an accident then it could not be blamed for causing the accident.

On my last ride in a Cruise this almost happened to me, and I did for a moment fear for my life.  See the description of this October 19th 2023 event in this short blog post. And see a video here of the same sort of thing happening in August, where a bad collision did occur and the Cruise occupant ended up in hospital.

An event in early October finally caught up with Cruise. A pedestrian had been hit by another vehicle and was thrown into the path of a Cruise vehicle, which was unable to stop in time before driving over the person. What Cruise did not reveal at the time was that the Cruise vehicle then decided to pull over and drove twenty feet with the person trapped underneath the car and dragged the person along. At last report the person was still in hospital.

There were conflicting reports on whether Cruise initially withheld from investigators the part of a video recording that shows that dragging of the pedestrian. But by late October the California Department of Motor Vehicles suspended Cruise from all driverless operations

GM had bought Cruise for a reported $1 billion in 2016. By November 14th this year, there were rumblings that GM was stepping in, pushing changes in safety, and would reduce its support for Cruise, having given it an additional $8.2 billion since 2017, with $1.9 billion just in 2023.  It also bought out Softbank’s shares in Cruise for $3.4 billion in 2022. On November 16th, Cruise suspended a share buy-back program for Cruise employees, which let them cash out on their stock options. The company said it needed to revalue the shares. That was an ominous sign. By November 19th the CEO (and co-founder) of Cruise, Kyle Vogt, stepped down.

GM did indeed pull back on November 29th.

G.M.’s chief financial officer, Paul Jacobson, said spending at Cruise would fall by “hundreds of millions of dollars” in 2024, and would probably fall further as the company reviewed the division’s operations.

GM also stopped work on a new custom vehicle, without conventional controls, intended for Cruise to use in large scale taxi services.

After earlier layoffs of temporary workers who maintained their fleet, in mid-December Cruise had large scale layoffs. First, “nine key leaders” were fired as a byproduct of ongoing safety reviews. And then 900 of their 3800 employees were laid off.

As of the end of the year 2023, none of Cruise’s 950 “autonomous” vehicles, even when driven by humans, venture anywhere on roads in the United States.

Now let’s backtrack about three weeks. The kicker is that although Cruise had made it sound like their vehicles were completely self driving there had been people at the ready to steer them through difficult situations remotely. They were not operating in the way they presented themselves. The CEO had argued that they were safer than human drivers. But they had human drivers to handle situations their “robot”s could not.

In a NYTimes story about the whole mess on November 3rd, I noticed a detail that I had not previously seen.

Half of Cruise’s 400 cars were in San Francisco when the driverless operations were stopped. Those vehicles were supported by a vast operations staff, with 1.5 workers per vehicle. The workers intervened to assist the company’s vehicles every 2.5 to five miles, according to two people familiar with is operations. In other words, they frequently had to do something to remotely control a car after receiving a cellular signal that it was having problems.

Whoa!  Driverless means that there is no human involved in the actual driving. Here the story says that there is an army of people, 1.5 persons per car, who intercede remotely every 2.5 to 5 miles of travel.  I thought I had been taking Cruise vehicles that were driving themselves.

In fact, everyone I talked to in San Francisco thought that the Cruise and Waymo vehicles were fully autonomous as they were so bad in certain ways. I would routinely see vehicles stopped and blocking traffic for 30 minutes. Or three vehicles together blocking an intersection with no visible progress to untangling themselves. And the SF Fire Department was very frustrated with Cruise vehicles wandering into active fire areas, then stopping with their wheels on a fire hose, refusing to move on.

On November 4th then CEO Kyle Vogt posted a statement:

Cruise CEO here. Some relevant context follows.

Cruise AVs are being remotely assisted (RA) 2-4% of the time on average, in complex urban environments. This is low enough already that there isn’t a huge cost benefit to optimizing much further, especially given how useful it is to have humans review things in certain situations.

The stat quoted by nyt is how frequently the AVs initiate an RA session. Of those, many are resolved by the AV itself before the human even looks at things, since we often have the AV initiate proactively and before it is certain it will need help. Many sessions are quick confirmation requests (it is ok to proceed?) that are resolved in seconds. There are some that take longer and involve guiding the AV through tricky situations. Again, in aggregate this is 2-4% of time in driverless mode.

In terms of staffing, we are intentionally over staffed given our small fleet size in order to handle localized bursts of RA demand. With a larger fleet we expect to handle bursts with a smaller ratio of RA operators to AVs. Lastly, I believe the staffing numbers quoted by nyt include several other functions involved in operating fleets of AVs beyond remote assistance (people who clean, charge, maintain, etc.) which are also something that improve significantly with scale and over time.

Cruise was not doing autonomous driving after all. They were routinely relying on remote human interventions.  But they were doing even that badly judging by all the failures I and others routinely saw.

Why is fully autonomous driving important?

In one word: economics.  The whole point of driverless cars has, for over a decade, been to produce a taxi service where human drivers are not needed.  The business model, the model all the car companies have been going after, is that the cars can drive themselves so no human needs to be paid as part of a taxi service.

We were told by the companies that their vehicles were safer than human driven vehicles, but in fact they routinely needed humans to monitor and to control them.  At one level I’m shocked, shocked, I tell you.  At another level I am embarrassed that they fooled me.  I had thought they were driving with no person monitoring them.

The reason I posted my predictions and talk about them every year was to try to overcome the hype that fools people about how far along technology is. The hucksters beat me in this case.

There is one other company in the US providing so-called autonomous taxi rides. I don’t know whether or not to believe them. I just don’t know. Fool me once…

And here is an interesting tweet:

 

And about that Fully Self Driving for consumer cars

In December there was a harsh but fair story in Rolling Stone about Tesla’s non-stop hype about its self-driving cars, that is, to put it plainly, a complete lie, and it has been every years since 2014 when the CEO of Tesla has announced that full self driving will be here “this year”. We now have ten years of the same promise, and ten years of it not being true.  [For many people that is “fool me ten times”.]

There is a reference in that story to a research report from earlier in the year by Noah Goodall at the Virginia Transportation Research Council. He disentangles some of Tesla’s misleading statistics:

Although Level 2 vehicles were claimed to have a 43% lower crash rate than Level 1 vehicles, their improvement was only 10% after controlling for different rates of freeway driving. Direct comparison with general public driving was impossible due to unclear crash severity thresholds in the manufacturer’s reports, but analysis showed that controlling for driver age would increase reported crash rates by 11%.

Prediction
[Self Driving Cars]
Date2018 CommentsUpdates
A flying car can be purchased by any US resident if they have enough money.NET 2036There is a real possibility that this will not happen at all by 2050.
Flying cars reach 0.01% of US total cars.NET 2042That would be about 26,000 flying cars given today's total.
Flying cars reach 0.1% of US total cars.NIML
First dedicated lane where only cars in truly driverless mode are allowed on a public freeway.
NET 2021
This is a bit like current day HOV lanes. My bet is the left most lane on 101 between SF and Silicon Valley (currently largely the domain of speeding Teslas in any case). People will have to have their hands on the wheel until the car is in the dedicated lane.
Such a dedicated lane where the cars communicate and drive with reduced spacing at higher speed than people are allowed to drive
NET 2024
20240101
This didn't happen in 2023 so I can call it now. But there are no plans anywhere for infrastructure to communicate with cars, though some startups are finally starting to look at this idea--it was investigated and prototyped by academia 20 years ago.
First driverless "taxi" service in a major US city, with dedicated pick up and drop off points, and restrictions on weather and time of day.
NET 2021
The pick up and drop off points will not be parking spots, but like bus stops they will be marked and restricted for that purpose only.20240101
People may think this happened in San Francisco in 2023, but it didn't. Cruise has now admitted that there were humans in the loop intervening a few percent of the time. THIS IS NOT DRIVERLESS. Without a clear statement from Waymo to the contrary, one must assume the same for them. Smoke and mirrors.
Such "taxi" services where the cars are also used with drivers at other times and with extended geography, in 10 major US citiesNET 2025A key predictor here is when the sensors get cheap enough that using the car with a driver and not using those sensors still makes economic sense.
Such "taxi" service as above in 50 of the 100 biggest US cities.NET 2028It will be a very slow start and roll out. The designated pick up and drop off points may be used by multiple vendors, with communication between them in order to schedule cars in and out.
Dedicated driverless package delivery vehicles in very restricted geographies of a major US city.
NET 2023
The geographies will have to be where the roads are wide enough for other drivers to get around stopped vehicles.
A (profitable) parking garage where certain brands of cars can be left and picked up at the entrance and they will go park themselves in a human free environment.
NET 2023
The economic incentive is much higher parking density, and it will require communication between the cars and the garage infrastructure.
A driverless "taxi" service in a major US city with arbitrary pick and drop off locations, even in a restricted geographical area.
NET 2032This is what Uber, Lyft, and conventional taxi services can do today.20240101
Looked like it was getting close until the dirty laundry came out.
Driverless taxi services operating on all streets in Cambridgeport, MA, and Greenwich Village, NY. NET 2035Unless parking and human drivers are banned from those areas before then.
A major city bans parking and cars with drivers from a non-trivial portion of a city so that driverless cars have free reign in that area.NET 2027
BY 2031
This will be the starting point for a turning of the tide towards driverless cars.
The majority of US cities have the majority of their downtown under such rules.NET 2045
Electric cars hit 30% of US car sales.NET 202720240101
This one looked pessimistic last year, but now looks at risk. There was a considerable slow down in the second derivative of adoption this year in the US.
Electric car sales in the US make up essentially 100% of the sales.NET 2038
Individually owned cars can go underground onto a pallet and be whisked underground to another location in a city at more than 100mph.NIMLThere might be some small demonstration projects, but they will be just that, not real, viable mass market services.
First time that a car equipped with some version of a solution for the trolley problem is involved in an accident where it is practically invoked.NIMLRecall that a variation of this was a key plot aspect in the movie "I, Robot", where a robot had rescued the Will Smith character after a car accident at the expense of letting a young girl die.

Electric Cars

I bought my first electric car this year.  I love it.

But it also made me realize how hard it is for many people to own an electric car. I have my own garage under my house. I charge my car in there.  A large portion of car owners in my city, San Francisco, have no private parking space. How does charging work for them?  They need to go to a public recharge station. And wait to charge. Having an electric car is an incredible time tax on people who do not have their own parking spot with access to electricity.  I had not fully appreciated how this will slow down adoption of electric cars until I owned one myself and could reflect on my own level of privilege in this regard.

Manufacturers have cut back on their sales forecasts for electric vehicles over the next couple of years, and are even reducing production. Ford did this after reporting that it lost $36,000 on every EV it sold in Q3.

See an analysis of why the scaling is hard just from a supply chain point of view.

Year over year electric car sales in Q3 were up by 49.8%.  Overall car sales were up significantly so the overall percentage is not that big and this report says that the electric sales rate in the US by year are:

2021: 3.2%

2022: 5.8%

2023: 7.2%

So this says my estimate of 30% electric sales by 2027 is very much at risk, as that requires two more full doublings of percentage just as manufacturers are slowing things down.  I was abused heavily on Twitter for being so pessimistic back in 2018. Right now I think my prediction was accurate or even optimistic.

At year end there was a well researched article in the Wall Street Journal. Electric vehicle purchase rates are geographically lumpy, and the rate of increase has slowed in many places.

Flying Cars

When I made my predictions about flying cars back in 2018, flying cars were cars that could drive on roads and that could also fly. Now the common meaning has changed to largely be eVTOL’s, electric Vertical Take Off and Landing vehicles, that sit statically on the ground when not flying.  I.e., electric helicopters. And people talk about them as taxis that will whisk people around through the three dimensional airspace avoiding clogged roads. And them being everywhere.

Spoiler alert.  Not going to happen.

Late in 2022, soon after Larry Page pulled the plug on his twelve year old eVTOL company I did an analysis of the lack of videos showing any practical looking flights of any potential eVTOL solution, despite companies have multi-billion dollar valuations. If practical eVTOL solutions are around the corner certainly there should be videos of them working. There aren’t.

Late in 2023, one of the “sky high” valuation companies participated in an unveiling of a taxi service in NYC, with breathless commentary, and even a breathless speech from the Mayor of NYC.  They’re coming, they’re coming, we are all going to live in a new world of flying taxis.

Here is the video, from November 14, 2023, of a Joby eVTOL taxi flight in NYC.  It is titled: “First-ever electric air taxi flight takes off in NYC”.

Except that it has no passengers, and it just flies a slow loop out over the water and back. It has “Experimental” painted on the front door.

Not the four passengers and 200mph speed that co-founder JoeBen Bevirt speaks about in the video.  It is not an operational flight path at all. You can see that the passenger seats are all empty at the 19 second mark, whereas the pilots legs are clearly visible.

In a video from about a month prior titled “Flying Joby’s Electric Air Taxi with a Pilot On Board”, shot in Marina, California, the company says that they have now flown their vehicle with four different test pilots.  And the video shows it off very much as a test flight, with no passengers onboard.

There is no mention of automating out the pilot, which was one of the key fantasies of eVTOL taxis.

Also note the size of this vehicle.  There are many fossil fuel powered helicopters that are much smaller.  This is not going to be a personally owned vehicle for the masses.

Don’t hold your breath. They are not here. They are not coming soon.

Robotics, AI, and Machine Learning

Prolog

It is worth reading this story, with the increasing number of CEOs in Silicon Valley ending up in jail for overhyping their businesses to the point of fraud. Many, many, more walk that line, some for multiple companies at the same time. From the story:

“Governance got a bit loosey-goosey during the bubble,” said Healy Jones, vice president of financial strategy at Kruze Consulting, a provider of financial services for start-ups. Lately, Mr. Jones said, he has noticed venture firms doing more due diligence on potential investments, but “they probably shouldn’t get a gold star for fulfilling their job description.”

This is what happens when everyone is entitled to their own alternate facts. Current icons, young brash worshipped CEOs, are not immune to forcing their own alternate facts upon, first, eager investors, and second, people willing to set aside critical judgement when promised a magical rosy future.

It is far easier to make stuff up than refute it. It has happened with self driving cars, and flying taxis. It is rampant in the humanoid robotics and AI spaces.

Summary

I only made three comments in the table this year, and only one of them is directly about a predicted milestone being hit.  And, as you might guess, they are all about Generative AI and Large Language Models. No question that 2023 was the year when those topics hit the general consciousness of the scientific, cultural, and political worlds.  I’ve officially been an AI researcher since 1976, and before that I was a high school and undergraduate hobbyist, but this is the first year I have heard politicians throughout the world say the words “Artificial Intelligence”.  And when they have said those words no one has been surprised, and everyone sort of thinks they know what they are talking about.

I had not bothered to predict a rosy future for humanoid robots, as when I made my predictions I had been working in that space for over twenty five years and had built both research humanoids and thousands of humanoid robots that were deployed in factories. The extraordinarily difficult challenges, requiring fundamental research breakthroughs, were clear to me. There are plenty of naive entrepreneurs saying that work will be changed by humanoid robots within a handful of years. They are wrong. My lack of predictions about humanoid robots were based on my expectations that they will not play any significant role for at least another 25 years.

Here are some humanoid robots that I and the teams I have led have built: Cog (large team of graduate students), Kismet (Cynthia Breazeal, in the picture), Domo (Aaron Edsinger and Jeff Weber), and then Baxters (Rethink Robotics).







 

The prediction that happened this year

I had predicted that the “next big thing” in AI, beyond deep learning, would show up no earlier than 2023, but certainly by 2027. I also said in the table of predictions in my January 1st, 2018, that for sure someone was already working on that next big thing, and that papers were most likely already published about it.  I just didn’t know what it would be; but I was quite sure that of the hundreds or thousands of AI projects that groups of people were already successfully working hard on, one would turn out to be that next big thing that everyone hopes is just around the corner.  I was right about both 2023 being when it might show up, and that there were already papers about it before 2018.

Why was I successful in those predictions?  Because it always happens that way and I just found the common thread in all “next big things” in AI, and their time constants.

The next big thing, Generative AI and Large Language Models started to enter the general AI consciousness last December, and indeed I talked about it a little in last year’s prediction update.  I said that it was neither the savior nor the destroyer of mankind, as different camps had started to proclaim right at the end of 2022, and that both sides should calm down.  I also said that perhaps the next big thing would be neuro-symbolic Artificial Intelligence.

By March of 2023, it was clear that the next big thing had arrived in AI, and that it was Large Language Models.  The key innovation had been published before 2018, in 2017, in fact.

Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). “Attention is All you Need”. Advances in Neural Information Processing Systems. Curran Associates, Inc. 30.

So I am going to claim victory on that particular prediction, with the bracketed years (OK, so I was a little lucky…) and that a major paper for the next big thing had already been published by the beginning of 2018 (OK, so I was even luckier…).

Prediction
[AI and ML]
Date2018 CommentsUpdates
Academic rumblings about the limits of Deep Learning
BY 2017
Oh, this is already happening... the pace will pick up.
The technical press starts reporting about limits of Deep Learning, and limits of reinforcement learning of game play.
BY 2018
The popular press starts having stories that the era of Deep Learning is over.
BY 2020
VCs figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".
NET 2021
I am being a little cynical here, and of course there will be no way to know when things change exactly.
Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.
NET 2023
BY 2027
Whatever this turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.20240101
It definitely showed up in 2023. It was in the public mind in December 2022, but was not yet the big thing that it became during 2023. A year ago I thought it would perhaps be neuro-symbolic AI, but clearly it is LLMs, and ChatGPT and its cousins. And, as I predicted in 2018 it was something already being worked on as the "attention is all you need" paper, the key set of ideas, was published in 2017.
The press, and researchers, generally mature beyond the so-called "Turing Test" and Asimov's three laws as valid measures of progress in AI and ML.
NET 2022
I wish, I really wish.20230101
The Turing Test was missing from all the breathless press coverage of ChatGPT and friends in 2022. Their performance, though not consistent, pushes way past the old comparisons.
20240101
The Turing Test was largely missing from the press in 2024 also, and there was a story in Nature commenting on that. So yes, this has now happened.
Dexterous robot hands generally available.NET 2030
BY 2040 (I hope!)
Despite some impressive lab demonstrations we have not actually seen any improvement in widely deployed robotic hands or end effectors in the last 40 years.
A robot that can navigate around just about any US home, with its steps, its clutter, its narrow pathways between furniture, etc.Lab demo: NET 2026
Expensive product: NET 2030
Affordable product: NET 2035
What is easy for humans is still very, very hard for robots.
A robot that can provide physical assistance to the elderly over multiple tasks (e.g., getting into and out of bed, washing, using the toilet, etc.) rather than just a point solution.NET 2028There may be point solution robots before that. But soon the houses of the elderly will be cluttered with too many robots.
A robot that can carry out the last 10 yards of delivery, getting from a vehicle into a house and putting the package inside the front door.Lab demo: NET 2025
Deployed systems: NET 2028
A conversational agent that both carries long term context, and does not easily fall into recognizable and repeated patterns.
Lab demo: NET 2023
Deployed systems: 2025
Deployment platforms already exist (e.g., Google Home and Amazon Echo) so it will be a fast track from lab demo to wide spread deployment.20240101
One half of this happened this year. ChatGPT has been connected to microphones and speakers so you can now talk to it. and It does not fall into recognizable patterns. BUT the other half is the half it does not have; it has no updatable memory apart from its token buffer of what it has just said. Long term context may be long term in coming.
An AI system with an ongoing existence (no day is the repeat of another day as it currently is for all AI systems) at the level of a mouse.NET 2030I will need a whole new blog post to explain this...
A robot that seems as intelligent, as attentive, and as faithful, as a dog.NET 2048This is so much harder than most people imagine it to be--many think we are already there; I say we are not at all there.
A robot that has any real idea about its own existence, or the existence of humans in the way that a six year old understands humans.NIML

 

What do I think about Generative AI and Large Language Models?

On November 28th I gave a talk at MIT as the opening keynote for MIT’s Generative AI Week. Here is the video of my talk, and here is a part of my talk written up as a blog post.

The title was “Unexpected manna mantra“. I didn’t want to talk about all the wealth or destruction (see salvationists versus doomers) that others talk about, and hence the crossed out “manna” in the title. Instead the talk is about what the existence of these “valuable cultural tools” (due to Alison Gopnik at UC Berkeley) tells us about deeper philosophical questions about how human intelligence works, and how they are following a well worn hype cycle that we have seen again, and again, during the 60+ year history of AI.

I concluded my talk encouraging people to do good things with LLMs but to not believe the conceit that their existence means we are on the verge of Artificial General Intelligence.

By the way, there are the initial signs that perhaps LLMs have already passed peak hype. And the ever interesting Cory Doctorow has written a piece on what will be the remnants after the LLM bubble has burst. He says there was lots of useful stuff left after the dot com bubble burst in 2000, but not much beyond the fraud in the case of the burst crypto bubble.

He tends to be pessimistic about how much will be left to harvest after the LLM bubble is gone. Meanwhile right at year’s end the lawsuits around LLM training are starting to get serious.

HUMAN SpaceFLIGHT

Crewed space flight crawled on during 2023. It did not feel like a golden age.

There were only five crewed orbital flights in 2023, two where Russian Soyuz, two were NASA SpaceX Dragons, and one was a commercial Axiom-2 SpaceX Dragon, with three paying customers. All five flights were to the International Space Station (ISS).

There were seven crewed suborbital flights, all by Virgin Galactic. Two were company test flights, and five had at least some paying customers on board. This means that Virgin Galactic has now had a total of six flights which involved more than test pilots (the previous such flight was in 2021).

Blue Origin had a mishap with an uncrewed suborbital vehicle in 2022, and finally flew an uncrewed vehicle again on December 19th, 2023. Perhaps they will be back to crewed flights in 2024.

This, again, was not the year that space tourism really took off.  In fact many of the paying customers were from space agencies of other countries that do not have their own human launch capability.. A market for getting national astronauts into space is starting to develop. The 14 day Axiom-3 mission scheduled for January 2024 will take three paying customers to the ISS, all of whom are national astronauts, from Italy, Turkey, and Sweden. The Italian astronaut, Walter Villadei, flew one of the suborbital Virgin Galactic flights in 2023.

The bright spot for space in 2023 was the continued unparalleled (ever) success of SpaceX’s Falcon 9 rockets. They had zero failures,  There with 91 launches of the single booster version, and every one of those boosters were recovered in vertical soft landings (though late in December B1058 that had successfully landed on a floating barge, after its nineteenth launch, was destroyed when it fell over in rough seas being transported back to land). Three of those launches sent people to the ISS. There were 5 launches of Falcon Heavy, the triple booster version. All four attempts to recover the two side boosters were successful, bringing all eight of them back successfully.

SpaceX Falcons had a total of 31 launches in 2021, 61 in 2022, and now 96 in 2023. There have now been a total of 285 single booster launches with only 2 failures, and nine Falcon Heavy  launches with no failures. SpaceX’s Falcon rocket is in a success class of its own.

It is worth noting, however, that this large scale successful deployment took time. The first launch of a Falcon 9 took place in June 2010, thirteen and a half years ago. The attempted recovery of that booster failed. By the end of 2013 there had been only seven launches total, with no successful booster recoveries, despite four attempts. The first booster to be recovered (but not reflown) was in December 2015.

It wasn’t until March 2017 that there was a reflight of a recovered booster (first flown in April 2016).

Prediction
[Space]
Date2018 CommentsUpdates
Next launch of people (test pilots/engineers) on a sub-orbital flight by a private company.
BY 2018
A few handfuls of customers, paying for those flights.
NET 2020
A regular sub weekly cadence of such flights.
NET 2022
BY 2026
20240101
There were four flights in 2021, three in 2022, and seven, five with customers on board, in 2023--all of them by Virgin Glactic. Blue Origin did not fly in 2023. At this point 2026 is looking doubtful for regular flights every week.
Regular paying customer orbital flights.NET 2027Russia offered paid flights to the ISS, but there were only 8 such flights (7 different tourists). They are now suspended indefinitely. 20240101
There were three paid flights in 2021, and one each in 2022, and 2023, with the latter being the Axiom 2 mission using SpaceX hardware. So not regular yet, and certainly not common.
Next launch of people into orbit on a US booster.
NET 2019
BY 2021
BY 2022 (2 different companies)

Current schedule says 2018.20240101
Both SpaceX and Boeing were scheduled to have crewed flights in 2018. SpaceX pulled it off in 2020, Boeing's Starliner did not fly at all in 2023, but is scheduled to launch with people onboard for the first time in April 2024.
Two paying customers go on a loop around the Moon, launch on Falcon Heavy.
NET 2020
The most recent prediction has been 4th quarter 2018. That is not going to happen.20240101
Starship launched twice in 2023 but didn't get to orbit either time. This is going to be well over six years later than the original prediction by the CEO of SpaceX.
Land cargo on Mars for humans to use at a later date
NET 2026SpaceX has said by 2022. I think 2026 is optimistic but it might be pushed to happen as a statement that it can be done, rather than for an pressing practical reason.20240101
I was way too optimistic, and bought into the overoptimistic hype of the CEO of SpaceX even though I added four years, doubling his estimated time frame.
Humans on Mars make use of cargo previously landed there.NET 2032Sorry, it is just going to take longer than every one expects.
First "permanent" human colony on Mars.NET 2036It will be magical for the human race if this happens by then. It will truly inspire us all.
Point to point transport on Earth in an hour or so (using a BF rocket).NIMLThis will not happen without some major new breakthrough of which we currently have no inkling.
Regular service of Hyperloop between two cities.
NIML
I can't help but be reminded of when Chuck Yeager described the Mercury program as "Spam in a can".
20240101
Calling this one 26 years early. As of today no-one is still working on this in an operating company.

 

Boeing’s Starliner

First announced in 2010 Boeing’s Starliner was originally scheduled to fly a human crew in 2018. It carried out its second uncrewed flight in May 2022, and is now scheduled to have its first crewed test flight in April 2024.

Thereafter it is expected to fly with a crew once a year.  After this dismally long development period, that will give the US its second commercial human capable orbital space craft.

Starship

Starship is SpaceX’s superheavy two stage rocket, designed to put 150 tons of payload into orbit, but also be able to go to the Moon or Mars. The first stage has 33 Raptor engines, and that stage is to land back on a ship or land as the current Falcon first stages do so successfully. The second stage has a total of six Raptor engines, three optimized to operate in space and three in the atmosphere. The second stage is to return from orbit burning off kinetic energy using a heat shield to re-enter the atmosphere, and then land vertically back at the launch site.

Over US Thanksgiving in 2021 the CEO of SpaceX urged his workers to abandon their families and come in to work to boost the production rate of Raptor engines. In his email he said:

What it comes down to is that we face genuine risk of bankruptcy if we cannot achieve a Starship flight rate of at least once every two weeks next year.

“Next year” was 2022. There were zero flights in 2022, certainly not one every two weeks. There were two flights total in 2023, and both of those had both stages for the first time. Both flights in 2023 ended up with both stages blowing up. SpaceX has become renowned for moving fast and blowing stuff up.  But the US’s plan for returning people to the surface of the Moon in 2025 is now very unlikely. That plan requires 15 successful launches of Starship to operate flawlessly for that single mission.

The return to the Lunar surface is going to be significantly delayed, and revenue producing flights of Starship are going to be way behind schedule.

Artemis

The second Artemis mission, using the Orion Capsule, Artemis II, will fly to the Moon with four people aboard, the first crewed Artemis flight. It was scheduled to launch in May 2024, but has been delayed by six months. This will be the first crewed mission beyond low Earth orbit (LEO) since 1972.

Artemis III was scheduled to launch in 2025 with a return to the surface of the Moon. However that relied on using a Starship (itself refueled in LEO by 14 (yes, fourteen!!) other Starship launches) to land there.  No one any longer believes that schedule, and it is likely delayed a few years, given where Starship is in its development and current capability.

Blue Origin Orbital Class Engines and Vehicles

Back in 2022 Blue Origin delivered two BE-4 engines to ULA, a customer, for use in their new Vulcan Centaur rocket, freeing ULA from its reliance on Russian engines. The first launch was supposed to happen in 2023, and in December a launch was delayed until January 2024. It does look like it will fly soon.

A BE-4 exploded during testing at Blue Origin in June of 2023, but whatever issues were there seem to have been overcome. They are designed to fly 100 times each.

Blue Origin’s own first orbital class rocket, New Glenn, was also due to fly in 2023, with four BE-4 engines.  It has been delayed until August 2024.

And finally, hyperloop

My prediction was that hyperloop was not going to happen in my lifetime, i.e., not by 2050, still twenty six years from now. But I called it today in the table. I was right.

For those who don’t remember, the hyperloop concept was hyped as much as generative AI is these days. The idea was that small pods would rush down evacuated tubes (often said to be underground, which was the rationale for starting new tunnel boring companies), at hundreds of miles per hour.  With people in them.  Point to point, LA downtown to San Francisco downtown in an hour.

In 2018 I wrote about what is hard and what is easy, and why, and said:

Building electric cars and reusable rockets is fairly easy. Building a nuclear fusion reactor, flying cars, self-driving cars, or a Hyperloop system is very hard. What makes the difference?

And it turns out it was much harder. As of December 31st, 2023, Hyperloop One, started by a member of the billionaire we-can-do-anything-one-of-us-thinks-about-for-five-minutes-before-telling-the-world-about-my-visionary-idea club, has completely shut down.  It is particularly worth reading the brutal last two paragraphs of that story. And the last sentence is generally worth remembering at all times:

The future, it would seem, is nearly the same as the present.

As I have said many times:

Having ideas is easy. Turning them into reality is hard. Turning them into being deployed at scale is even harder.

Progress inches along. It did with ships, trains, automobiles, airplanes, rockets, and reusable boosters. All of them moved along with many players, inventors, and investors, over at least decades. Big things that involve a lot of kinetic energy, and especially those that also carry people, take somewhere from decades to centuries to develop and deploy at scale.

Looking Forward

Get your thick coats now. There may be yet another AI winter, and perhaps even a full scale tech winter, just around the corner. And it is going to be cold.

Autonomous Vehicles 2023, Part III

rodneybrooks.com/autonomous-vehicles-2023-part-iii/

To finish up this brief series on the reality of autonomous, or self driving vehicles, in 2023 I want to outline my experiences taking almost 40 rides in San Francisco in Cruise vehicles.

I have previously reported on my first three rides in Cruise vehicles back in May of 2022. In those three rides, as with all subsequent rides, there was no driver in the front seat, and the steering wheel turned as though a ghost was sitting there driving.

In 2023 I took roughly three dozen more rides. At first they were part of a special test program that Cruise operated for free. But the last half of them were part of the regular “taxi” service that Cruise started operating, where I paid for them just like one would using Uber or Lyft. For a while Cruise had 300 vehicles operating, but they backed off to 150 after some safety issues.

Here I report what the experience was like.

It was certainly different from using an Uber or a Lyft, and always more uncertain on when and whether I would be picked up, where I would be picked up and where I would be dropped off. The waits were sometimes over twenty minutes for the vehicle to arrive, and sometimes the ride would be cancelled while my promised vehicle was finally in view, and it just cruised on by me, with no explanation on why I had been dumped.

On pick up location I need to point out that during 2022, and most of 2023, the streets of San Francisco were thick with vehicles from both Waymo and Cruise driving around with no one in the driver’s seat. They were collecting data. This followed years of both companies having drivers in their vehicles collecting data, and mapping the entire city.

I would see them every single time I left my house, day or night — but never on my block itself — it is a very difficult block for human drivers to navigate.  Even the human driven data collection Waymos and Cruises never ventured on my block. On one occasion in November of this year I saw four empty Waymo’s drive down my hill–perhaps it was a mistake.  But overall, I do not think the companies have ever mapped my particular block.

Cruise always told me it could not pick me up at my house. Sometimes it told me where to walk to (sometimes as far as two blocks away) and sometimes it asked me to choose.

Earlier in 2023 Cruise vehicles had a hard time dropping me off in a busy street and would continue around the block searching for a place to pull in to out of the traffic. Towards the end of the year, before Cruise shut them down completely, they would stop in traffic in a crowded restaurant area much as an Uber or Lyft would.

But this ease of drop off did not extend to near my house, a leafy, and at night mostly deserted streets neighborhood. They would insist on finding a place to pull into out of the non-existent traffic, sometimes a bit of a hike to get back to my house.

Likewise there was a geographically determined different behavior for pick up. Earlier in the year they would keep driving until they found a place to pull into. That continued near my house throughout the year. One time it was so far further on from where it had told me to wait, that by the time I caught up with car, running to get there I might add, a human was speaking from the car asking me if I was having trouble pushing the unlock button on my app.

However in crowded restaurant areas the Cruise vehicles first became more aggressive about pulling into an empty spot, with stronger braking than a human driver use, perhaps because humans were picking up on the spot being empty earlier than the Cruise vehicles. Later in the year the Cruise vehicles started to imitate human Uber and Lyft drivers and would stop in an active traffic lane to enable a pickup.

In summary, the pick up and drop off behavior of Cruise vehicles got incrementally better in crowded areas throughout the year. The behavior did not change in easier less traffic areas, and was quite substandard compared to human drivers in those areas. Some blocks have not been mapped and are no go areas for driverless ride services. Whether one actually gets a ride or not is rather hit and miss, even if you are prepared to wait a long time.

It is not yet a deployed service. And now Cruise has shut down ride services in all six cities that it had begun operating in, while they thoroughly review safety. And it turns out they were not even autonomous. Human operators were intervening in almost 5 percent of the rides.

Meanwhile, Cruise’s owning company, GM, has announced they are pulling back on their investment of operating cash and other resources. Certainly GM’s driverless car service is at risk.

And also it turns out the cars are not autonomous or driverless.  See my upcoming new year report.

Three Things That LLMs Have Made Us Rethink

rodneybrooks.com/three-things-that-llms-have-made-us-rethink/

There are three things that the existence of LLMs, such as ChatGPT-3.5 and ChatGPT-4 make us have to rethink. At different times and amongst different communities they have all had lots of AI researchers talking about them, often with much passion.

Here are three things to note:

  1. The Turing Test has evaporated.
  2. Searle’s Chinese Room showed up, uninvited.
  3. Chomsky’s Universal Grammar needs some bolstering if it is to survive.

We’ll talk about each in turn.

The Turing Test

In a 1950 paper titled Computing Machinery and Intelligence, Alan Turing used a test which involved a human deciding whether an entity that the person was texting with was a human or a computer. Of course, he did not use the term “texting” as that had not yet been invented, rather he suggested that the communication was via a “teleprinter”, which did exist at the time, where the words typed in one location appeared on paper in a remote location. “Texting” is the modern equivalent.

Turing used this setup as rhetorical device to argue that if you could not accurately and reliably decide whether it was a person or a computer at the other end of the line then you had to grant that a machine could be intelligent.  His point was that it was not just simulating intelligence but that it would actually be intelligent if people could not tell the difference.

Turing said;

I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent, chance of making the right identification after five minutes of questioning.

His number 109 referred to how many bits of program would be needed to achieve this result, which is 125MB, i.e., 125 Mega Bytes. Compare this with ChatGPT-3.5 which has 700GB, or 700 Giga Bytes, of weights (175 billion 32 bit weights) that it has learned, which is almost 6,000 times as much.

His paragraph above continues:

The original question, ‘Can machines think!’ I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. 

Despite his goal to be purely a rhetorical device to make the question, ‘Can machines think!’ (I assume the punctuation was a typo and was intended to be ‘?’) meaningless, this led to people calling the machine/person discernment test the Turing Test, and it became the default way of thinking about how to determine when general Artificial Intelligence had been achieved.  But, of course, it is not that simple. That didn’t stop annual Turing Tests being set up, with entrants from mostly amateur researchers, who had built chat bots designed not to do any useful work in the world, but designed and built simply to try to pass the Turing Test. It was a bit of a circus and mostly not very useful.

Earlier this year I felt like I was not hearing about the Turing Test with regards to all the ChatGPTs, and in fact the scientific press had noticed this too, with this story in Nature in July of this year:

 

Don’t worry, there are still papers being written on the Turing Test and ChatGPT, for instance this one from October 2023, but the fervor of declaring that it is important has decreased.

We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%).

In general the press has moved away from the Turing Test. ChatGPT seems to have the sort of language expertise that people imagined some system as intelligent as a person would have, but it has become clear that it is not the crystalline indication of intelligence that Turing was trying to elucidate.

SEARLE’S CHINESE ROOM

In 1980, John Searle, a UC Berkeley philosopher, introduced the idea of a “Chinese Room”, as a way to argue that computers could not be truly intelligent in the way that people are, not truly engaged with the world in the way people are, and not truly sentient in the way people are.

He chose “Chinese” as the language for the room as it was something totally foreign to most people working in Artificial Intelligence in the US at the time. Furthermore its written form was in atomic symbols.

Here is what ChatGPT-3.5 said when I asked it to describe Searle’s Chinese Room. I have highlighted the last clause in blue.

Imagine a person who doesn’t understand the Chinese language locked inside a room. This person has a set of instructions (a program) written in English that tells them how to manipulate Chinese symbols based on the symbols that are input to the room through a slot. They have no understanding of Chinese and don’t know the meanings of the symbols.

From the outside, someone passes messages written in Chinese through the slot into the room. The person inside the room follows the instructions in the program and produces responses in Chinese based solely on symbol manipulation, without understanding the content or meaning of the messages.

To an observer outside the room, it may appear as though the person inside understands Chinese, as they can generate coherent responses in Chinese. However, the person inside the room doesn’t actually understand the language. They are merely following a set of syntactic rules or algorithms for manipulating symbols, without grasping the semantics or meanings of those symbols.

To prove that ChatGPT really is the Chinese Room, I typed in some Chinese text to version 3.5 and it responded with Chinese text:

艾未未是谁?

艾未未是一位知名的中国艺术家、社会活动家和评论家。他以其大胆的艺术创作和公开对政府审查和人权侵犯的批评而闻名。艾未未的作品涉及雕塑、摄影、行为艺术、装置艺术和社交媒体等多种媒介,展现了对政治、社会和人权议题的关注。他因批评中国政府而多次引起争议,这导致了他与政府当局的冲突,并且在过去曾经被拘留和监禁。他因其独特的艺术作品和公开表达的观点而在国际艺术界和社会活动家圈中备受瞩目。

My question was “Who is Ai Wei Wei?” and it responded in what people tell me is perfectly good Chinese, describing the Chinese artist Ai Wei Wei, his art, and his run ins with the government.

So here we have Searle’s Chinese Room.  It has been trained on billions of words from the web but has no experience of the world, has never heard Chinese spoken, and has never even read Chinese characters as characters but rather has seen them only as integers along with (I assume) a pre-processing step to map whichever of the five (Unicode, GB, GBK, Big 5, and CNS) common digital Chinese character code standards each document uses to that common set of integers. (The fact that ChatGPT `knows’, in its description of the room, above, that Chinese is written in symbols is not because it has ever seen them, but because it has “read” that Chinese uses symbols.)

The fact that GPT is the Chinese Room, the fact that one now exists, means that many of the old arguments for and against Searle’s position that he was staking out with his rhetorical version of the room must now be faced squarely and perhaps re-evaluated.  Searle’s Chinese Room was a topic of discussion in AI for well over 25 years. Everyone had to have an opinion or argument.

In my book Flesh and Machines: How Robots Will Change Us (Pantheon, New York, 2002), I made two arguments that were in opposition to Searle’s description of what his room tells us.

Firstly, I argued (as did many, many others) that indeed Searle was right that the person in the room could not be said to understand Chinese. Instead we argued that it was the whole system, the person, the rule books, and the state maintained in following the rules that was what understood Chinese. Searle was using the person as a stand in for a computer fresh off the production line, and ignoring the impact of loading the right program and data on to it. In the ChatGPT case it is the computer, plus the algorithms for evaluating linear neuron models plus the 175 billion weights that are together what make ChatGPT-3.5 understand Chinese, if one accepts that it does. In my book I said that no individual neuron in a human brain can be said to understand Chinese, it has to be the total system’s understanding that we talk about. ChatGPT-3.5 is an example of a computer doing the sort of thing that Searle was arguing was not possible, or at least should not be spoken about in the same way that we might speak about a person understanding Chinese.

Secondly, I argued (using Searle as the person in the room as he sometimes did):

Of course, as with many thought experiments, the Chinese room is ludicrous in practice. There would be such a large set of rules, and so many of them would need to be followed in detailed order that Searle would need to spend many tens of years slavishly following the rules, and jotting down notes on an enormous supply of paper. The system, Searle and the rules, would run as a program so slowly that it, the system, could not be engaged in any normal sorts of perceptual activity. At that point it does get hard to effectively believe that the system understands Chinese for any usual understanding of `understand’. But precisely because it is such a ludicrous example, slowed down by factors of billions, any conclusions from that inadequacy can not be carried over to making conclusions about whether a computer program running the same program `understands’ Chinese.

Hmm, well my bluff has been call by the existence of ChatGPT.  First, note that I was right about the size of the rule set, 175 billion neural weights, that it would take a person effectively forever to follow them. But every modern laptop can hold all those rules in the file system (it is less than a terabyte of memory), and the algorithm is parallel enough that a chunk of processing in the cloud can make ChatGPT run at human language speeds.

If I maintain my above argument from 2002, I would have to say that ChatGPT does `understand’ Chinese. But those who have read my writings over the years would guess, rightly, that I don’t think it does. Without grounding in physical reality I don’t think a machine can understand in the same way we do.  ChatGPT is just like someone following rules with no semantic understanding of the symbols, but it does it at the speed my argument above said was necessary for it to really be understanding.  But now I’m going to say it still doesn’t understand. My old self and my today self are not being intellectually coherent, so I am going to have to think about this some more over the next few years and refine, perhaps rethink, but certainly change in some way what it is I conclude from both Searle and ChatGPT existing.

Other people over the last forty years have argued, and I have agreed, that language in humans is strongly grounded in non-language. So, we have argued that a computer program, like ChatGPT-3.5 could not have a consistent performance level that would seem like human language.  ChatGPT-3.5 certainly seems to have such consistent performance, as long as you don’t poke it too deep–it certainly has a level that would work for most of your daily interactions with strangers.  Our arguments are therefore challenged or broken. I don’t yet know how to fix them.

CHOMSKY’S UNIVERSAL GRAMMAR

Nom Chomsky is the most influential linguist ever and he spent most of his career in MIT’s linguistics department. His advocacy of the existence of a universal grammar system in humans, and humans only, has shaped much cognitive science and theories about cognition in both humans and non-humans.

I asked ChatGPT to “explain Chomsky’s universal grammar”, and it replied:

Noam Chomsky’s theory of Universal Grammar (UG) is a hypothesis in the field of linguistics that suggests that the ability to acquire language is innate to humans, and that there is a universal set of grammatical structures that underlies all human languages. Chomsky introduced this concept in the 1950s and has since developed and refined it over the years.

Key points of Chomsky’s Universal Grammar include:

Innate Language Faculty: Chomsky argues that humans are born with an innate capacity for language. He proposes the existence of a “language acquisition device” (LAD) or some innate cognitive structures that enable children to rapidly and effortlessly learn the language(s) to which they are exposed during early childhood.

Universal Grammar Principles: Chomsky suggests that all human languages share a common underlying structure or set of principles. These principles are considered to be part of the Universal Grammar and are hard-wired into the human brain. They provide a framework that constrains the range of grammatical structures and rules that are possible in any language.

Poverty of the Stimulus: Chomsky’s argument for the existence of Universal Grammar includes the “poverty of the stimulus” idea. He contends that the linguistic input that children receive is often insufficient or ambiguous, yet they are able to acquire complex grammatical structures. This implies that there must be an innate mechanism guiding language acquisition.

Language Acquisition as Rule-Based: Chomsky posits that the process of language acquisition involves the internalization of grammatical rules. Children, according to this theory, unconsciously deduce the grammatical structures of their native language(s) based on the linguistic input they receive.

It’s important to note that while Chomsky’s Universal Grammar theory has been influential and has shaped the field of linguistics, it is not without its critics. Some researchers argue for alternative theories of language acquisition that emphasize the role of environmental factors and learning mechanisms. Additionally, the nature and specifics of the supposed universal grammar are still topics of ongoing research and debate within the field of linguistics.

ChatGPT gets it right, this is what Chomsky was arguing for. I highlighted in blue one particular alternate view that ChatGPT brought up.  [It is no surprise to me that ChatGPT made such a good summary–that is the sort of thing it is quite good at, having ingested billions of pages of human written material on all sorts of matters.]

Here is a diagram I had someone draw for me to illustrate the idea of Universal Grammar. A child hears words and acquires both the grammar of the language and the language itself.

 

I asked: Does ChatGPT have a universal grammar module?
To which ChatGPT replied:

No, ChatGPT does not have a universal grammar module in the same sense as Chomsky’s Universal Grammar theory. ChatGPT is based on a different approach known as transfer learning, specifically the GPT (Generative Pre-trained Transformer) architecture developed by OpenAI.

ChatGPT learns from a diverse range of internet text during its pre-training phase, but it doesn’t have explicit knowledge or representation of grammatical rules as predefined modules. Instead, it captures patterns, context, and relationships within the data it has been trained on. It doesn’t have a built-in understanding of linguistic structures in the way that the concept of Universal Grammar suggests.

The model relies on statistical patterns and associations in the data it has seen during training to generate contextually relevant responses. While it can generate coherent and contextually appropriate text, it doesn’t possess an explicit understanding of grammar rules or linguistic universals as proposed by Chomsky’s theory.

Again, it got it completely right.

Clearly ChatGPT (and other LLMs) learn human language to a level of performance that we had not previously see in computational systems, a level that we are forced to compare favorably with humans. And they learn language without any sort of universal grammar knowledge that Chomsky argues both constrains and enables what humans can learn as they acquire language. They also learn it without any role for environmental factors that ChatGPT mentioned above in its reference to criticisms of universal grammar, and that I highlighted in blue.

No structures in the human brain that implement a universal grammar system, even quite distributed, have been identified. ChatGPT certainly doesn’t have one. Does this put the universal grammar theory at risk?

Maybe.  Maybe not.

Chomsky’s argument about the poverty of stimulus for human children may be what saves the need for universal grammar in humans. ChatGPT is exposed to a few orders of magnitude of source material more (and it uses massive more energy to learn) than a human child.

On the other hand the ability to learn human grammar with no mechanism for grammar built in is certainly a surprise, at least to time traveling AI researchers from thirty or even twenty years ago.

Universal grammar enthusiasts have long argued that no other biological species can have language that has grammar and recursive composability, because they don’t have universal grammar. Computational LLMs do (have such language) and don’t (have universal grammar).

I personally remember very heated arguments on the MIT campus between Chomsky followers and computer scientists working on statistical models in the early 2000’s.  The arguments came up in computer science faculty hiring meetings. The Chomskians claimed that because of the need for universal grammar there would never be anything useful that came out of statistical approaches to language. ChatGPT has proved them wrong. (And yes, I personally shouted angrily at Chomskians in some of those hiring meetings.)

The question is whether there is a more narrowly arguable way to push forward on the need for universal grammar in resource and stimulus limited environments.

A Preliminary Conclusion

The existence, for one year now, of functional, available, LLMs has challenged some previous intellectual positions. Those challenges will be debated and made sharper. The existence of those challenges, however, does not necessarily mean that any, or even many, of the wild predictions around for how important and transformative   LLMs will be will come to pass. Things have changed forever, but as with many other forever changes in AI over the last 65 years, there are most likely many more things that we will change and that we will learn,

I believe that we are still in the uncertain baby step stages. It is worth repeating the last sentence of Alan Turing’s computational machinery paper that we started off with above. It is still as relevant today as it was in 1950.

We can only see a short distance ahead, but we can see plenty there that needs to be done. 

Autonomous Vehicles 2023, Part II

rodneybrooks.com/autonomous-vehicles-2023-part-ii/

I was going to write this post later this week filling in my promised experience from Thursday Oct 19th, 2023, experience of real fear that I might be involved in a really bad accident, while riding in a Cruise driverless taxi in San Francisco. The reason for rushing it out today is this story, today, that Cruise is no longer allowed to operate driverless taxis in San Francisco.

The story implies that they will no longer be allowed to operate even empty vehicles without a safety driver, which would mean two steps backwards from where they have been in San Francisco. It also says that Cruise misrepresented safety information to regulators.

My Recent Experience

I have taken around 36 Cruise driverless taxi rides over the last handful of months. They have had their ups and downs in user experience, and I had planned to talk about some of those in explaining why I do not think the experience is really what one expects from a deployed taxi service like Uber or Lyft.

But last Thursday night I had a moment where I experienced real fear, where for half a second I thought I might be involved in an extremely bad accident.

It was at night and we were crossing Divisadero, heading west, on Filbert. Left is a steep uphill few blocks on Divisadero. There was a car coming down the hill quite fast, as we crossed Divisadero. My Cruise, with nothing at all in front of it, braked hard, really hard, right in the middle of the intersection, harder than I had ever experienced a Cruise taxi braking. That brought us (me and my taxi) to almost a complete stop right in the path of the oncoming vehicle. Fortunately the other vehicle started to slow down and then the Cruise moved on out of its way.

This, above, is my recollection of what happened. When it braked hard a real pang of fear shot through my body. When I saw the car heading right at us a conscious version of that fear kicked in.

A human driver in that situation would mostly likely continue to drive and not brake at all. Braking was the best possible way to cause a collision. Not a good choice.

In previous accidents that have resulted in collisions Cruise vehicles have been at a stop. My interpretation, and I have no knowledge of whether this is true or not, was that rather than take the risk of hitting another vehicle while moving, the algorithms were set to freeze when there was an imminent collision, as better than running into someone else. A weird hard-wired trolley problem solution which does not protect the Cruise vehicle, but unfortunately for a rider does not protect them either. And in many cases increases the likelihood of  a collision rather than reduces it.

See a Cruise with a passenger freezing in the middle of an intersection back in August, getting hit and sending a passenger to hospital.

More to come…

Autonomous Vehicles 2023, Part I

rodneybrooks.com/autonomous-vehicles-2023-part-i/

My Early Experience with Self Driving Cars

Back in the summer of 1979 (forty four years ago) I was Hans Moravec’s gopher, at the Stanford AI Lab, helping test his self driving Cart for his PhD thesis. Every night that summer we would wait until midnight or so, when most people had gone home and so the single main frame computer was relatively unloaded. We would set up an indoor obstacle course for the robot, and then Hans would set it on its way. It would take nine visual images with a left to right sliding camera, process them and plan a path over the next fifteen minutes, then blind drive the first meter of that path. Four meters per hour was its averaged top speed.

We have come a long way since then.

I went on to build mobile robots at MIT, and from that work have come 50 million home-based mobile robots from a company (iRobot) that I founded with two students in 1990, along with military mobile robots, and as part of the path to mobile robots on Mars.  My current startup (my sixth) is also building mobile robots, deployed in warehouses–they navigate autonomously with all computation onboard, using cameras as their only source of information, but at speeds measured in meters per second, and they are safe in the presence of humans.

My Recent Writing about Self Driving Cars

I have long been skeptical of the promises made about how soon self driving cars will fill our streets, and have written blog posts about why on this site, in terms of unexpected consequences and edge cases, both in 2017. I made predictions about AVs (along with AI and space) back on Jan 1st, 2018, and have reviewed them every January 1st since then. In my review from this year, January 1st, 2023, I included the following updated graphic from 2017.

These are public prediction from March 2017, made by industry executives, about when driverless cars would be here. The dates in parentheses were the years they were made, the dates highlighted in pink were the years they said their predictions would come to pass that have since passed, and the orange  arrows indicate cases where I had seen them later walked back. As you can see they were all wrong, at least so far.

See a collection of prediction headlines from over the years just published by Gary Marcus.

I think these predictions, full of unsupported hubris, have done us a dis-service.

They reinforced the idea that we would have one-for-one replacement of human drivers with driverless cars. In all other cases where mankind has changed how we transport people and goods we have had massive changes in infrastructure. These range from agrarian to empire and Roman roads (still the outline of most major road routes across Europe), wharves in ports, inland canals, railroad tracks, paved roadways, freeways, airports, and world-wide air traffic control.

The tech enthusiasts, used to large scale deployment of software rather than physical objects, assumed that the world would stay the same, and instead we would just have driverless vehicles amongst human driven vehicles. This assumption was the source of my two critiques in 2017.

I have also noted that autonomous trains are still not very widely deployed, and where they are they have different infrastructure than human driven trains, including completely separate tracks. I have ridden them in many airports and out in and under cities in Toulouse and Tokyo, but they are not widespread. In the US the only significant self-driving trains outside of airports are to the west of Honolulu in Oahu, still not quite making it into the downtown area.

The dis-service of self driving predictions is that for the last dozen years we stopped talking about how to instrument our roads to make autonomous vehicles safe. I think we could have had self driving cars much more quickly if we had made offboard changes to our infrastructure, rather than imagining that everything would be done onboard.

RIDING IN AUTONOMOUS VEHICLES TODAY

I took my first ride in a self driving car on public roads back in April 2012, in a Google vehicle. That project later became Waymo.

In May 2022 I took three rides in a Cruise vehicle on the streets of San Francisco and blogged about it here.

I now regularly (more than once a week) ride in Cruise vehicles in San Francisco and have tweeted about it.

I often don’t report on individual rides as they happen without incident.  I have reported about particular rides, that I or others have taken, when I think that the vehicles have done something for which a human driver would somehow be sanctioned if they had done the same thing. That sanction level might be other drivers or pedestrians expressing censure, legal officials directing them to do something different, or issuance of a legal citation.

My next blog post will be about my experiences in self driving cars, summarized. Spoiler alert: in my last ride, just two days ago, I first experienced real fear, where for half a second I thought I might be involved in an extremely bad accident. The type of fear where you feel a jolt in your heart, and your body readies itself for fight or flight.

 

What Will Transformers Transform?

rodneybrooks.com/what-will-transformers-transform/

Generative Pre-trained Transformer models (GPTs) are now all the rage and have inspired op-eds being written by everyone from Henry Kissinger (WSJ) to Noam Chomsky (NYTimes) in just the last month. That sure is some hype level.

Way back in the early history of GPTs, January 1st this year, I wrote briefly about them and said:

Calm down people. We neither have super powerful AI around the corner, nor the end of the world caused by AI about to come down upon us.

I stick with that advice, but in this post I want to say why, and talk about where these systems will have impact. In short, there will be valuable tools produced, and at the same time lots of damaging misuse.

What triggers me to write here in more detail is both the continued hype, and the release of GPT-4 during the week of March 13th, and the posting of the “GPT-4 Technical Report” by many hundreds of authors at OpenAI.  [[The linked PDF is 98 pages long and contains two papers, one titled “GPT-4 Technical Report” that fills the first 38 pages of the PDF, and one titled “GPT-4 System Card” which is 60 pages long with its pages number 1 to 60, but mapped to PDF pages 39 to 98.]]

In mid-February of this year Stephen Wolfram wrote a very clear (it is long, this is a hard and big topic) post about how and why ChatGPT works. As he says, it is “Just Adding One Word at a Time”. [[Actually, in the couple of days I have been writing my post here, Wolfram’s post has also come out as a printed book…]]

Together, the OpenAI and Wolfram reports give a very good technical understanding of most things GPT.

state of the art GPTs from Open AI

For the last few months there has been lots of excitement about the 175 billion parameter GPT-3 from the company Open AI. It was set up, under the name ChatGPT, so that people could query it, type in a few words and have it “answer” the question. The words set the context and then one word at a time it pops out the word to follow what the context, now including what it had already said, that its learned model judged to be a good follow on word. There is some randomness in choosing among competing very good words, so it answers questions differently at different times. Microsoft attached GPT to its search engine Bing at around the same time.

Sometimes the results seem stunningly good, and people of all stripes have jumped to the conclusion that GPT-3 was heralded the coming of “Artificial General Intelligence”. [[By the way, even since the earliest days of AI, the 1955 proposal for the 1956 workshop on AI, the document in which the term AI first appears anywhere, the goal of the researchers was to produce general intelligence. That AGI is a different term than AI now is due to a bunch of researchers a dozen or so years ago deciding to launch a marketing campaign for themselves by using a new buzz acronym. “AGI” is just “AI” as it was known for the first 50+ years of its existence. Hype produced the term “AGI” with which we are now saddled.]]

This inference of AI arriving momentarily is a clear example of how people mistake performance for competence.  I talked about it back in 2017 as one of the seven deadly sins of predicting the future of AI. I said then that:

We [humans] are able to generalize from observing performance at one task to a guess at competence over a much bigger set of tasks. We understand intuitively how to generalize from the performance level of the person to their competence in related areas.

But the skills we have for doing that for a person break down completely when we see a strong performance from an AI program. The extent of the program’s competence may be extraordinarily narrow, in a way that would never happen with a person.  I think that a lot of people, early on certainly, made this mistake with GPT-3. I’ll show some examples of how GPT-3 and GPT-4 fail in unintuitive ways below.

Meanwhile, in the week of March 13th, 2023, GPT-4, with a reported 100 trillion learned parameters (i.e., about 571 times as many as for GPT-3), was introduced. And it turned out, according to Peter Lee at Microsoft, that the GPT version that had been part of Bing for the previous weeks and months was GPT-4 all along.

According to reports GPT-4 felt qualitatively different to many users. It felt “more”, and “better”. This sort of feeling something has changed has been reported in the 1990’s when people played a chess program with a previously unachievable depth limit, or when working with a theorem proving program when it can search further than before. (Refs, which will take me longer to track down are to Garry Kasparov and Bob Constable (of Cornell).) I think the “this is better” feeling is natural here, but the presence of that feeling is not in itself a particularly useful indicator.

SOme Dumb things from CHATGPT

GPT-n cannot reason, and it has no model of the world. It just looks at correlations between how words appear in vast quantities of text from the web, without know how they connect to the world. It doesn’t even know there is a world.

I think it is a surprise to many that it seems as smart as it does given that all it has is these correlations. But perhaps that says something about how we humans relate to language and the world, and that language really only, perhaps, has a tenuous connection to the world in which we live; a scary thought.

There is a veritable cottage industry of individuals showing how LLMs can easily be provoked into showing that they have no capability in spatial reasoning, ordinal numbers, or even small quantities below 10.

Here are examples of troubles with numbers, and other interesting failures, with ChatGPT in the Communications of the ACM from earlier this year. The authors, Gary Marcus and Ernie Davis, are both faculty at NYU. Here is just one of their examples:

Here is an example of complete lack of spatial reasoning from an article by Fletcher Jones Professor of Economics Gary N. Smith of Pomona College:

Human: The table didn’t fit in the car because it was too small. What was too small? 

ChatGPT:  The table was too small to fit in the car.

Human: Would it help to get a smaller car? 

ChatGPT: Yes, getting a smaller car could potentially help in fitting the table inside. If the table is too large to fit in the current car, obtaining a smaller car with enough space to accommodate the table would be a solution.

Here is another example which Marcus and Davis write about, where someone asked ChatGPT to write a program.

It writes a Python program without getting the indentation right, and without actually addressing the JSON description part of the specification, but it sure manages to reproduce the sexism and racism contained in its training set. See the guardrails in GPT-4 below.

And here are three examples from the Marcus and Davis paper above (all due to other authors who they cite), where ChatGPT is happy to make stuff up, because it really doesn’t understand how important many words really are:

Two Simple But Amazing examples

Here are two tweets that random people sent out where ChatGPT seems to be funny and smart, and where it doesn’t have to reason to get there, rather it has to generate plausible text. And these two simple examples show how plausible it can be.

And this one runs up against some guard rails that have been put into the system manually, but which are bust through on the second request.

I think it is easy to see from these two examples that performance/competence confusion is very likely.  It makes no sense that a person who could respond in these ways would be as ritualistically as dumb about numbers and spatial relations as the previous section reveals for ChatGPT.

What does OPEN AI Say about GPT-4?

The opening section of the GPT-4 Tech Report from Open AI is instructive, as it includes this paragraph (my emphasis):

Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [1, 31, 32]: it is not fully reliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learn from experience. Care should be taken when using the outputs of GPT-4, particularly in contexts where reliability is important.

Open AI is being quite clear here that GPT-4 has limitations.  However they appear to be agnostic on who should be taking care.  Is it the responsibility of people who use the GPT-4 parameter set in some product, or is it the end user who is exposed to outputs from that product? Open AI does not express an opinion on this matter.

In a second paper in the .pdf, i.e., the “GPT-4 System Card” they go through the mitigation against dangerous or wrong outputs that they have worked on for the last six months, and give comparisons between what was produced early on at “GPT-4 (early)” and what is produced now at “GPT-4 (launch)”. They have put in a number of guard rails that clearly reduce the amount of both objectionable and dangerous output that can be produced. Nevertheless on page 19 of the System Card (page 57 of the .pdf) they say:

As noted above in 2.2, despite GPT-4’s capabilities, it maintains a tendency to make up facts, to double-down on incorrect information, and to perform tasks incorrectly. Further, it often exhibits these tendencies in ways that are more convincing and believable than earlier GPT models (e.g., due to authoritative tone or to being presented in the context of highly detailed information that is accurate), increasing the risk of overreliance.

This is pretty damning. Don’t rely on outputs from GPT-4.

Earlier in the System Card report (page 7/45):

In particular, our usage policies prohibit the use of our models and products in the contexts of high risk government decision making (e.g, law enforcement, criminal justice, migration and asylum), or for offering legal or health advice.

Here they are protecting themselves by outlawing certain sorts of usage in their license.

This is in the context of their human red team having probed GPT-4 and introduced new training so that often it will refuse to produce harmful text when it matches a class of prompts against which it has been trained.

But their warnings reproduced above  say that they are not at all confident that we will not see real problems with some of the things produced by GPT-4. They have not been able to bullet-proof it with six months of work by a large team.  This is no surprise. There are many many long tail cases to consider and patch up. The same was true for autonomous driving and the result is that we are three to five years on from where executives at major automobile companies were predicting we would have level 4 driving in consumer cars. That experience should be a cautionary tale for GPT-4 and brethren, saying that reliance on them will be fraught for many years to come, unless they are very much boxed in to how they can be used.

On March 21st, 2023, Sundar Pichai, CEO of Google, on the introduction of Bard A.I., Google’s answer to GPT-4, warned his employees, that “things will go wrong”.

Always a person in the loop in successful AI Systems

Many successful applications of AI have a person somewhere in the loop.  Sometimes it is a person behind the scenes that the people using the system do not see, but often it is the user of the system, who provides the glue between the AI system and the real world.

This is true of language translation systems where a person is reading the output and, just as they do with children, the elderly, and foreigners, adapts quickly to the mistakes the person or system makes, and fill in around the edges to get the meaning, not the literal interpretation.

This is true of speech understanding systems where we talk to Alexa or Google Home, or our TV remote, or our car. We talk to each of them slightly differently, as we humans quickly learn how to adapt to their idiosyncracies and the forms they can understand and not understand.

This is true of our search engines, where we have learned how to form good queries that will get us the information we actually want, the quickest.

This is true our smart cameras where we have learned how to take photographs with them rather than with a film camera (though in this case they are often superhuman in their capabilities).

This is true where we are talking to a virtual agent on a web site where we will either be left to a frustrating experience or the website has back up humans who connect in to help with tricky situations.

This is true of driver assist/self driving modes in cars where the human driver must be prepared to take over in an instant in high stress situations.

This is true of mobile robots in hospitals, taking the dirty sheets and dishes to be cleaned, or bringing up prescriptions from the hospital pharmacy,  where there is a remote network operations center that some unseen user is waiting to take over control when the robot gets confused.

This is true of chess where the best players are human chess experts working with a chess engine, and together they play better than any chess engine by itself.

And this is true of art work, produced by stable diffusion models, where the eye of the beholder always belongs to a human.

Below I predict the future for the next few years with GPTs and point out that their successful deployment will always have a person in the loop in some sense.

Predicting the future is hard

Roy Amara, who died on the last day of 2007, was the president of a Palo Alto based think tank, the Institute for the future, and is credited with saying what is now known as Amara’s Law:

We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.

This has been a common problem with Artificial Intelligence, and indeed of all of computing. In particular, since I first became conscious of the possibility of Artificial Intelligence around 1963 (and as an eight year old proceeded to try to build my own physical and intelligent computers, and have been at it ever since), I have seen these overestimates many many times.

A few such instances of AI technologies that have induced gross overestimates of how soon we would get to AGI, in roughly chronological order, that I personally remember include:

John McCarthy’s estimate that the computers of the 1960’s were powerful enough to support AGI, Minsky and Michie and Nilsson each believing that search algorithms were the key to intelligence, neural networks (volume 3, perceptrons) [[I wasn’t around for the first two volumes; McCulloch and Pitts in 1943, Minsky in 1953]], first order logic, resolution theorem proving, MacHack (chess 1), fuzzy logic, STRIPS, knowledge-based systems (and revolutionizing medicine), neural networks (volume 4, back propagation), the primal sketch, self driving cars (Dickmanns, 1987), reinforcement learning (rounds 2 and 3), SOAR, qualitative reasoning, support vector machines, self driving cars (Kanade et al, 1997), Deep Blue (chess 2), self driving cars (Thrun, 2007), Bayesian inference, Watson (Jeopardy, and revolutionizing medicine), neural networks (volume 5, deep learning), Alpha GO, reinforcement learning (round 4), generative images, and now large language models. All have heralded the imminence of human level intelligence in machines. All were hyped up to the limit, but mostly in the days when very few people were even aware of AI, so very few people remember the levels of hype. I’m old. I do remember all these, but have probably forgotten quite a few…

None of these things have lived up to that early hype. As Amara predicted at first they were overrated. But at the same time, almost every one of these things have had long lasting impact on our world, just not in the particular form that people first imagined. As we twirled them around and prodded them, and experimented with them, and failed, and retried, we remade them in ways different from how they were first imagined, and they ended up having bigger longer term impacts, but in ways not first considered.

How does this apply to GPT world?  As always, the hype is overestimating the utility and the threats. However much will come from GPT-like systems.

Do I have it wrong?

Ada Lovelace said something similar to Amara’s Law back in 1843. This is from her first paragraph of “Note G”, in her notes she wrote to accompany a translation she made of someone else’s notes on the Analytical Engine in 1843. With her emphasis:

In considering any new subject, there is frequently a tendency, first, to overrate what we find to be already interesting or remarkable; and, secondly, by a sort of natural reaction, to undervalue the true state of the case, when we do discover that our notions have surpassed those that were really tenable.

Here the first half matches the first half of Amara’s Law. Her second half touches on something different than Amara’s second half. She says that when we get chastened by discovering we were overly optimistic out of the gate we pull back too far on our expectations.

Having seen the hype cycle so often and seen it go a particular way so often, am I now undervaluing the subject of a new hype cycle? If this is hype cycle n, I would have been right to undervalue the hype for the previous n-1 times. Am I just pattern matching and thinking it would be right to undervalue for time n? Am I suffering from cynicism?  And I just a grumpy old guy who thinks he’s seen it all? Perhaps. We’ll have to see with time.

In General, what will Happen?

Back in 2010 Tim O’Reilly tweeted out “If you’re not paying for the product then you’re the product being sold.”, in reference to things like search engines and apps on telephones.

I think that GPTs will give rise to a new aphorism (where the last word might vary over an array of synonymous variations):

If you are interacting with the output of a GPT system and didn’t explicitly decide to use a GPT then you’re the product being hoodwinked.

I am not saying everything about GPTs is bad. I am saying that, especially given the explicit warnings from Open AI, that you need to be aware that you are using an unreliable system.

Using an unreliable system sounds awfully unreliable, but in August 2021 I had a revelation at TED in Monterey, California, when Chris Anderson (the TED Chris), was interviewing Greg Brockman, the Chairman of Open AI about an early version of GPT. He said that he regularly asked it questions about code he wanted to write and it very quickly gave him ideas for libraries to use, and that was enough to get him started on his project. GPT did not need to be fully accurate, just to get him into the right ballpark, much faster than without its help, and then he could take it from there.

Chris Anderson (the 3D robotics one, not the TED one) has likewise opined (as have responders to some of my tweets about GPT) that using ChatGPT will get him the basic outline of a software stack, in a well tread area of capabilities, and he is many many times more productive than with out it.

So there, where a smart person is in the loop, unreliable advice is better than no advice, and the advice comes much more explicitly than from carrying out a conventional search with a search engine.

[[Earlier this year I posted to my facebook friends that I was having trouble converting over a software system that I have been working on for 30+ years from running natively on an x86 Mac to running natively on an M1 ARM Mac. The issue was that my old technique for changing memory that my compiler had just written instructions into as data to then allow it to be executed as instructions was not working.  John Markoff suggested that I ask ChatGPT, which I then did. It gave me a perfect multi-paragraph explanation of how to do it, starting off with “…on an M1 Macintosh…”. The problem was the explanation was completely accurate for an x86 Macintosh, and was exactly what I had been doing for the last 10+ years, but completely wrong for an M1 Macintosh.]]

The opposite of useful can also occur, but again it pays to have a smart human in the loop.  Here is a report from the editor of a science fiction magazine which pays contributors. He says that from late 2022 through February of 2023 the number of submissions to the magazine increased by almost two orders of magnitude, and he was able to determine that the vast majority of them were generated by chatbots. He was the person in the loop filtering out the signal he wanted, human written science fiction, from vast volumes of noise of GPT written science fiction.

Why should he care?  Because GPT is an auto-completer and so it is generating variations on well worked themes.  But, but, but, I hear people screaming at me.  With more work GPTs will be able to generate original stuff.  Yes, but it will be some other sort of engine attached to them which produces that originality.  No matter how big, and how many parameters, GPTs are not going to to do that themselves.

When no person is in the loop to filter, tweak, or manage the flow of information GPTs will be completely bad. That will be good for people who want to manipulate others without having revealed that the vast amount of persuasive evidence they are seeing has all been made up by a GPT.  It will be bad for the people being manipulated.

And it will be bad if you try to connect a robot to GPT. GPTs have no understanding of the words they use, no way to connect those words, those symbols, to the real world. A robot needs to be connected to the real world and its commands need to be coherent with the real world. Classically it is known as the “symbol grounding problem”. GPT+robot is only ungrounded symbols. It would be like you hearing Klingon spoken, without any knowledge other than the Klingon sound stream (even in Star Trek you knew they had human form and it was easy to ground aspects of their world). A GPT telling a robot stuff will be just like the robot hearing Klingonese.

[[And, of course, for those who have read my more obscure writing for the last 30+  years (see Nature (2001), vol 409, page 409), I do have issues with whether the symbol grounding problem is the right way of thinking about things, but for this argument it is good enough.]]

My argument here is that GPTs might be useful, and well enough boxed, when there is an active person in the loop,  but dangerous when the person in the loop doesn’t know they are supposed to be in the loop. [This will be the case for all young children.]  Their intelligence, applied with strong intellect, is a key component of making any GPT be successful.

Specific Predictions

Here I make some predictions for things that will happen with GPT types of systems, and sometimes coupled with stable diffusion image generation. These predictions cover the time between now and 2030. Some of them are about direct uses of GPTs and some are about the second and third order effects they will drive.

  1. After years of Wikipedia being derided as not a referable authority, and not being allowed to be used as a source in serious work, it will become the standard rock solid authority on just about everything. This is because it has built a human powered approach to verifying factual knowledge in a world of high frequency human generated noise.
  2. Any GPT-based application that can be relied upon will have to be super-boxed in, and so the power of its “creativity” will be severely limited.
  3. GPT-based applications that are used for creativity will continue to have horrible edge cases that sometimes rear their ugly heads when least expected, and furthermore, the things that they create will often arguably be stealing the artistic output of unacknowledged humans.
  4. There will be no viable robotics applications that harness the serious power of GPTs in any meaningful way.
  5. It is going to be easier to build from scratch software stacks that look a lot like existing software stacks.
  6. There will be much confusion about whether code infringes on copyright, and so there will be a growth in companies that are used to certify that no unlicensed code appears in software builds.
  7. There will be surprising things built with GPTs, both good and bad, that no-one has yet talked about, or even conceived.
  8. There will be incredible amounts of misinformation deliberately created in campaigns for all sorts of arenas from political to criminal, and reliance on expertise will become more discredited, since the noise will drown out any signal at all.
  9. There will be new categories of pornography.