Past is prologue1.
I mean that both the ways people interpret Shakespeare’s meaning when he has Antonio utter the phrase in The Tempest.
In one interpretation it is that the past has predetermined the sequence which is about to unfold–and so I believe that how we have gotten to where we are in Artificial Intelligence will determine the directions we take next–so it is worth studying that past.
Another interpretation is that really the past was not much and the majority of necessary work lies ahead–that too, I believe. We have hardly even gotten started on Artificial Intelligence and there is lots of hard work ahead.
THE EARLY DAYS
It is generally agreed that John McCarthy coined the phrase “artificial intelligence” in the written proposal2 for a 1956 Dartmouth workshop, dated August 31st, 1955. It is authored by, in listed order, John McCarthy of Dartmouth, Marvin Minsky of Harvard, Nathaniel Rochester of IBM and Claude Shannon of Bell Laboratories. Later all but Rochester would serve on the faculty at MIT, although by early in the sixties McCarthy had left to join Stanford University. The nineteen page proposal has a title page and an introductory six pages (1 through 5a), followed by individually authored sections on proposed research by the four authors. It is presumed that McCarthy wrote those first six pages which include a budget to be provided by the Rockefeller Foundation to cover 10 researchers.
The title page says A PROPOSAL FOR THE DARTMOUTH SUMMER RESEARCH PROJECT ON ARTIFICIAL INTELLIGENCE. The first paragraph includes a sentence referencing “intelligence”:
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.
And then the first sentence of the second paragraph starts out:
The following are some aspects of the artificial intelligence problem:
That’s it! No description of what human intelligence is, no argument about whether or not machines can do it (i.e., “do intelligence”), and no fanfare on the introduction of the term “artificial intelligence” (all lower case).
In the linked file above there are an additional four pages dated March 6th, 1956, by Allen Newell and Herb Simon, at that time at the RAND Corporation and Carnegie Institute of Technology respectively (later both were giants at Carnegie Mellon University), on their proposed research contribution. They say that they are engaged in a series of forays into the area of complex information processing, and that a “large part of this activity comes under the heading of artificial intelligence”. It seems that the phrase “artificial intelligence” was easily and quickly adopted without any formal definition of what it might be.
In McCarthy’s introduction, and in the outlines of what the six named participants intend to research there is no lack of ambition.
The speeds and memory capacities of present computers may be insufficient to simulate many of the higher functions of the human brain, but the major obstacle is not lack of machine capacity, but our inability to write programs taking full advantage of what we have.
Some of the AI topics that McCarthy outlines in the introduction are how to get a computer to use human language, how to arrange “neuron nets” (they had been invented in 1943–a little while before today’s technology elite first heard about them and started getting over-excited) so that they can form concepts, how a machine can improve itself (i.e., learn or evolve), how machines could form abstractions from using its sensors to observe the world, and how to make computers think creatively. These topics are expanded upon in the individual work proposals by Shannon, Minsky, Rochester, and McCarthy. The addendum from Newell and Simon adds to the mix getting machines to play chess (including through learning), and prove mathematical theorems, along with developing theories on how machines might learn, and how they might solve problems similar to problems that humans can solve.
No lack of ambition! And recall that at this time there were only a handful of digital computers in the world, and none of them had more than at most a few tens of kilobytes of memory for running programs and data, and only punched cards or paper tape for long term storage.
McCarthy was certainly not the first person to talk about machines and “intelligence”, and in fact Alan Turing had written and published about it before this, but without the moniker of “artificial intelligence”. His best known foray is Computing Machinery and Intelligence3 which was published in October 1950. This is the paper where he introduces the “Imitation Game”, which has come to be called the “Turing Test”, where a person is to decide whether the entity they are conversing with via a 1950 version of instant messaging is a person or a computer. Turing estimates that in the year 2000 a computer with 128MB of memory (he states it as binary digits) will have a 70% chance of fooling a person.
Although the title of the paper has the word “Intelligence” in it, there is only one place where that word is used in the body of the paper (whereas “machine” appears at least 207 times), and that is to refer to the intelligence of a human who is trying to build a machine that can imitate an adult human. His aim however is clear. He believes that it will be possible to make a machine that can think as well as a human, and by the year 2000. He even estimates how many programmers will be needed (sixty is his answer, working for fifty years, so only 3,000 programmer years–a tiny number by the standards of many software systems today).
In a slightly earlier 1948 paper titled Intelligent Machinery but not published4 until 1970, long after his death, Turing outlined the nature of “discrete controlling machines”, what we would today call “computers”, as he had essentially invented digital computers in a paper he had written in 1937. He then turns to making a a machine that fully imitates a person, even as he reasons, the brain part might be too big to be contained within the locomoting sensing part of the machine, and instead must operate it remotely. He points out that the sensors and motor systems of the day might not be up to it, so concludes that to begin with the parts of intelligence that may be best to investigate are games and cryptography, and to a less extent translation of languages and mathematics.
Again, no lack of ambition, but a bowing to the technological realities of the day.
When AI got started the clear inspiration was human level performance and human level intelligence. I think that goal has been what attracted most researchers into the field for the first sixty years. The fact that we do not have anything close to succeeding at those aspirations says not that researchers have not worked hard or have not been brilliant. It says that it is a very hard goal.
I wrote a (long) paper Intelligence without Reason5 about the pre-history and early days of Artificial Intelligence in 1991, twenty seven years ago, and thirty five years into the endeavor. My current blog posts are trying to fill in details and to provide an update for a new generation to understand just what a long term project this is. To many it all seems so shiny and exciting and new. Of those, it is exciting only.
In the early days of AI there were very few ways to connect sensors to digital computers or to let those computers control actuators in the world.
In the early 1960’s people wanting to work on computer vision algorithms had to take photographs on film, turn them into prints, attach the prints to a drum, then have that drum rotate and move up and down next to a single light brightness sensor to turn the photo into an array of intensities. By the late seventies, with twenty or thirty pounds of equipment, costing tens of thousands of dollars, a researcher could get a digital image directly from a camera into a computer. Things did not become simple-ish until the eighties and they have gotten progressively simply and cheaper over time.
Similar stories hold for every other sensor modality, and also for output–turning results of computer programs into physical actions in the world.
Thus, as Turing had reasoned, early work in Artificial Intelligence turned towards domains where there was little need for sensing or action. There was work on games, where human moves could easily be input and output to and from a computer via a keyboard and a printer, mathematical exercises such as calculus applied to symbolic algebra, or theorem proving in logic, and to understanding typed English sentences that were arithmetic word problems.
Writing programs that could play games quickly lead to the idea of “tree search” which was key to almost all of the early AI experiments in the other fields listed above, and indeed, is now a basic tool of much of computer science. Playing games early on also provided opportunities to explore Machine Learning and to invent a particular variant of it, Reinforcement Learning, which was at the heart of the recent success of the AlphaGo program. I described this early history in more detail in my August 2017 post Machine Learning Explained.
Before too long a domain known as blocks world was invented where all sorts of problems in intelligence could be explored. Perhaps the first PhD thesis on computer vision, by Larry Roberts at MIT in 1963, had shown that with a carefully lighted scene, all the edges of wooden block with planar surfaces could be recovered.
That validated the idea that it was OK to work on complex problems with blocks where the description of their location or their edges was the input to the program, as in principle the perception part of the problem could be solved. This then was a simulated world of perception and action, and it was the principal test bed for AI for decades.
Some people worked on problem solving in a two dimensional blocks world with an imagined robot that could pick up and put down blocks from the top of a stack, or on a simulated one dimensional table.
Others worked on recovering the geometry of the underlying three dimensional blocks from just the input lines, including with shadows, paving the way for future more complete vision systems than Roberts had demonstrated.
And yet others worked on complex natural language understanding, and all sorts of problem solving in worlds with complex three dimensional blocks.
No one worked in these blocks worlds because that was their ambition. Rather they worked in them because with the tools they had available they felt that they could make progress on problems that would be important for human level intelligence. At the same time they did not think that was just around the corner, one magic breakthrough away from all being understood, implemented, and deployed.
Over time may sub-disciplines in AI developed as people got deeper and deeper into the approaches to particular sub-problems that they had discovered. Before long there was enough new work coming out that no-one could keep up with the breadth of AI research. The names of the sub-disciplines included planning, problem solving, knowledge representation, natural language processing, search, game playing, expert systems, neural networks, machine inference, statistical machine learning, robotics, mobile robotics, simultaneous localization and mapping, computer vision, image understanding, and many others.
Often, as a group of researchers found a common set of problems to work on they would break off from the mainstream and set up their own journals and conferences where reviewing of papers could all be done by people who understood the history and context of the particular problems.
I was involved in two such break off groups in the late 1980’s and early 1990’s, both of which still exist today; Artificial Life, and Simulation of Adaptive Behavior. The first of these looks at fundamental mechanisms of order from disorder and includes evolutionary processes. The second looks at how animal behaviors can be generated by the interaction of perception, action, and computation. Both of these groups and their journals are still active today.
Below is my complete set of the Artificial Life journal from when it was published on paper from 1993 through to 2014. It is still published online today, by the MIT Press.
There were other journals on Artificial Life, and since 1989 there have been international conferences on it. I ran the 1994 conference and there were many hundreds of participants and there were 56 carefully reviewed papers published in hard copy proceedings which I co-edited with Pattie Maes; all those papers are now available online.
And here is my collection of the Adaptive Behavior journal from when it was published on paper from 1992 through to 2013. It is still published online today, by Sage.
And there has always been a robust series of major conferences, called SAB, for Simulation of Adaptive Behavior with paper and now online proceedings.
The Artificial Life Conference will be in Tokyo this year in July, and the SAB conference will be in Frankfurt in August. Each will attract hundreds of researchers. And the 20+ volumes of each of the journals above have 4 issues each, so close to 100 issues, with 4 to 10 papers each, so many hundreds of papers in the journal series. These communities are vibrant and the Artificial Life community has had some engineering impact in developing genetic algorithms which are in use in some number of application.
But neither the Artificial Life community nor the Simulation of Adaptive Behavior community have succeeded at their early goals.
We still do not know how living systems arise from non-living systems, and in fact still do not have good definitions of what life really is. We do not have generally available evolutionary simulations which let us computationally evolve better and better systems, despite the early promise when we first tried it. And we have not figured out how to evolve systems that have even the rudimentary components of a complete general intelligence, even for very simple creatures.
On the SAB side we can still not computationally simulate the behavior of the simplest creature that has been studied at length. That is the tiny worm C. elegans, which has 959 cells total of which 302 are neurons. We know its complete connectome (and even its 56 glial cells), but still we can’t simulate how they produce much of its behaviors.
I tell these particular stories not because they were uniquely special, but because they give an idea of how research in hard problems works, especially in academia. There were many, many (at least twenty or thirty) other AI subgroups with equally specialized domains that split off. They sometimes flourished and sometimes died off. All those subgroups gave themselves unique names, but were significant in size, in numbers of researchers and in active sharing and publication of ideas.
But all researchers in AI were, ultimately, interested in full scale general human intelligence. Often their particular results might seem narrow, and in application to real world problems were very narrow. But general intelligence has always been the goal.
I will finish this section with a story of a larger scale specialized research group, that of computer vision. That specialization has had real engineering impact. It has had four or more major conferences per year for thirty five plus years. It has half a dozen major journals. I cofounded one of them in 1987, with Takeo Kanade, the International Journal of Computer Vision, which has had 126 volumes (I only stayed as an editor for the first seven volumes) and 350 issues since then, with 2,080 individual articles. Remember, that is just one of the half dozen major journals in the field. The computer vision community is what a real large push looks like. This has been a sustained community of thousands of researchers world wide for decades.
I think the press, and those outside of the field have recently gotten confused by one particular spin off name, that calls itself AGI, or Artificial General Intelligence. And the really tricky part is that there a bunch of completely separate spin off groups that all call themselves AGI, but as far as I can see really have very little commonality of approach or measures of progress. This has gotten the press and people outside of AI very confused, thinking there is just now some real push for human level Artificial Intelligence, that did not exist before. They then get confused that if people are newly working on this goal then surely we are about to see new astounding progress. The bug in this line of thinking is that thousands of AI researchers have been working on this problem for 62 years. We are not at any sudden inflection point.
There is a journal of AGI, which you can find here. Since 2009 there have been a total of 14 issues, many with only a single paper, and only 47 papers in total over that ten year period. Some of the papers are predictions about AGI, but most are very theoretical, modest, papers about specific logical problems, or architectures for action selection. None talk about systems that have been built that display intelligence in any meaningful way.
There is also an annual conference for this disparate group, since 2008, with about 20 papers, plus or minus, per year, just a handful of which are online, at the authors’ own web sites. Again the papers range from risks of AGI to very theoretical specialized, and obscure, research topics. None of them are close to any sort of engineering.
So while there is an AGI community it is very small and not at all working on any sort of engineering issues that would result in any actual Artificial General Intelligence in the sense that the press means when it talks about AGI.
I dug a little deeper and looked at two groups that often get referenced by the press in talking about AGI.
One group, perhaps the most referenced group by the press, styles themselves as an East San Francisco Bay Research Institute working on the mathematics of making AGI safe for humans. Making safe human level intelligence is exactly the goal of almost all AI researchers. But most of them are sanguine enough to understand that that goal is a long way off.
This particular research group lists all their publications and conference presentations from 2001 through 2018 on their web site. This is admirable, and is a practice followed by most research groups in academia.
Since 2001 they have produced 10 archival journal papers (but see below), made 29 presentations at conferences, written 9 book chapters, and have 45 additional internal reports, for a total output of 93 things–about what one would expect from a single middle of the pack professor, plus students, at a research university. But 36 of those 93 outputs are simply predictions of when AGI will be “achieved”, so cut it down to 57 technical outputs, and then look at their content. All of them are very theoretical mathematical and logical arguments about representation and reasoning, with no practical algorithms, and no applications to the real world. Nothing they have produced in 18 years has been taken up and used by any one else in any application of demonstration any where.
And the 10 archival journal papers, the only ones that have a chance of being read by more than a handful of people? Every single one of them is about predicting when AGI will be achieved.
This particular group gets cited by the press and by AGI alarmists again and again. But when you look there with any sort of critical eye, you find they are not a major source of progress towards AGI.
Another group that often gets cited as a source for AGI, is a company in Eastern Europe that claims it will produce an Artificial General Intelligence within 10 years. It is only a company in the sense that one successful entrepreneur is plowing enough money into it to sustain it. Again let’s look at what its own web site tells us.
In this case they have been calling for proposals and ideas from outsiders, and they have distilled that input into the following aspiration for what they will do:
We plan to implement all these requirements into one universal algorithm that will be able to successfully learn all designed and derived abilities just by interacting with the environment and with a teacher.
Yeah, well, that is just what Turing suggested in 1948. So this group has exactly the same aspiration that has been around for seventy years. And they admit it is their aspiration but so far they have no idea of how to actually do it. Turing, in 1948, at least had a few suggestions.
If you, as a journalist, or a commentator on AI, think that the AGI movement is large and vibrant and about to burst onto the scene with any engineered systems, you are confused. You are really, really confused.
Journalists, and general purpose prognosticators, please, please, do your homework. Look below the surface and get some real evaluation on whether groups that use the phrase AGI in their self descriptions are going to bring you human level Artificial Intelligence, or indeed whether they are making any measurable progress towards doing so. It is tempting to see the ones out on the extreme, who don’t have academic appointments, working valiantly, and telling stories of how they are different and will come up something new and unique, as the brilliant misfits. But in all probability they will not succeed in decades, just as the Artificial Life and the Simulation of Adaptive Behavior groups that I was part of have still not succeeded in their goals of almost thirty years ago.
Just because someone says they are working on AGI, Artificial General Intelligence, that does not mean they know how to build it, how long it might take, or necessarily be making any progress at all. These lacks have been the historical norm. Certainly the founding researchers in Artificial Intelligence in the 1950’s and 1960’s thought that they were working on key components of general intelligence. But that does not mean they got close to their goal, even when they thought it was not so very far off.
So, journalists, don’t you dare, don’t you dare, come back to me in ten years and say where is that Artificial General Intelligence that we were promised? It isn’t coming any time soon.
And while we are on catchy names, let’s not forget “deep learning”. I suspect that the word “deep” in that name leads outsiders a little astray. Somehow it suggests that there is perhaps a deep level of understanding that a “deep learning” algorithm has when it learns something. In fact the learning is very shallow in that sense, and not at all what “deep” refers to. The “deep” in “deep learning” refers to the number of layers of units or “neurons” in the network.
When back propagation, the actual learning mechanism used in deep learning, was developed in the 1980’s most networks had only two or three layers. The revolutionary new networks are the same in structure as 30 years ago but have as many as 12 layers. That is what the “deep” is about, 12 versus 3. In order to make learning work on these “deep” networks there had to be lots more computer power (Moore’s Law took care of that over 30 years), a clever change to the activation function in each neuron, and a way to train the network in stages known as clamping. But not deep understanding.
WHY THIS ESSAY?
Why did I post this? I want to clear up some confusions about Artificial Intelligence, and the goals of people who do research in AI.
There have certainly been a million person-years of AI research carried out since 1956 (much more than the three thousand that Alan Turing thought it would take!), with an even larger number of person-years applied to AI development and deployment.
We are way off the early aspirations of how far along we would be in Artificial Intelligence by now, or by the year 2000 or the year 2001. We are not close to figuring it out. In my next blog post, hopefully in May of 2018 I will outline all the things we do not understand yet about how to build a full scale artificially intelligent entity.
My intent of that coming blog post is to:
- Stop people worrying about imminent super intelligent AI (yet, I know, they will enjoy the guilty tingly feeling thinking about it, and will continue to irrationally hype it up…).
- To suggest directions of research which can have real impact on the future of AI, and accelerate it.
- To show just how much fun research remains to be done, and so to encourage people to work on the hard problems, and not just the flashy demos that are hype bait.
In closing, I would like to share Alan Turing’s last sentence from his paper “Computing Machinery and Intelligence”, just as valid today as it was 68 years ago:
We can only see a short distance ahead, but we can see plenty there that needs to be done.
1This whole post started out as a footnote to one of the two long essays in the FoR&AI series that I am working on. It clearly got too long to be a footnote, but is
much somewhat shorter than my usual long essays.
2I have started collecting copies of hard to find historical documents and movies about AI in one place, as I find them in obscure nooks of the Web, where the links may change as someone reorganizes their personal page, or on a course page. Of course I can not guarantee that this link will work forever, but I will try to maintain it for as long as I am able. My web address has been stable for almost a decade and a half already.
3This version is the original full version as it appeared in the journal Mind, including the references. Most of the versions that can be found on the Web are a later re-typesetting without references and with a figure deleted–and I have not fully checked them for errors that might have been introduced–I have noticed at least one place were has been substituted for . That is why I have tracked down the original version to share here.
4His boss at the National Physical Laboratory (NPL), Sir Charles Darwin, grandson of that Charles Darwin, did not approve of what he had written, and so the report was not allowed to be published. When it finally appeared in 1970 it was labelled as the “prologue” to the fifth volume of an annual series of volumes titled “Machine Intelligence”, produced in Britain, and in this case edited by Bernard Meltzer and Donald Michie, the latter a war time colleague of Turing at Bletchley Park. They too, used the past as prologue.
5This paper was written on the occasion of my being co-winner (with Martha Pollack, now President of Cornell University) in 1991 of the Computers and Thought award that is given at the bi-annual International Joint Conference on Artificial Intelligence (IJCAI) to a young researcher. There was some controversy over whether at age 36 I was still considered young and so the rules were subsequently tightened up in a way that guarantees that I will forever be the oldest recipient of this award. In any case I had been at odds with the conventional AI world for some time (I seem to remember a phrase including “angry young man”…) so I was very grateful to receive the award. The proceedings of the conference had a six page, double column, limit on contributed papers. As a winner of the award I was invited to contribute a paper with a relaxed page limit. I took them at their word and produced a paper which spanned twenty seven pages and was over 25,000 words long! It was my attempt at a scholarly deconstruction of the field of AI, along with the path forward as I saw it.