Rodney Brooks

Robots, AI, and other stuff

Patrick Winston Explains Deep Learning

Patrick Winston is one of the greatest teachers at M.I.T., and for 27 years was Director of the Artificial Intelligence Laboratory (which later became part of CSAIL).

Patrick teaches 6.034, the undergraduate introduction to AI at M.I.T. and a recent set of his lectures is available as videos.

I want to point people to lectures 12a and 12b (linked individually below). In these two lectures he goes from zero to a full explanation of deep learning, how it works, how nets are trained, what are the interesting problems, what are the limitations, and what were the key breakthrough ideas that took 25 years of hard thinking by the inventors of deep learning to discover.

The only prerequisite is understanding differential calculus. These lectures are fantastic. They really get at the key technical ideas in a very understandable way. The biggest network analyzed in lecture 12a only has two neurons, and the biggest one drawn only has four neurons. But don’t be disturbed. He is laying the groundwork for 12b, where he explains how deep learning works, shows simulations, and shows results.

This is teaching at its best. Listen to every sentence. They all build the understanding.

I just wish all the people not in AI who talk at length about AI and the future in the press had this level of technical understanding of what they are talking about. Spend two hours on these lectures and you will have that understanding.

At YouTube, 12a Neural Nets, and 12b Deep Neural Nets.

13 comments on “Patrick Winston Explains Deep Learning”

  1. Maybe we can have create a certification for tech journalists so that readers know which ones have a technical understanding of the issues they write about and which ones don’t.

    1. And maybe we could extend that certification process to physicists who insist on talking about AI in the press…

      1. Agreed. A brilliant physicist may have generally intelligent things to say (relative to the average person on the street), but should be considered less of a thought leader on a specific topic that they have not studied.

        That said, I think it’s human nature to look toward “alphas”/”influencers” for leadership. The hope, of course, is that those (e.g. a physicist, celebrity, politician, etc.) who are in a position to influence the masses take the time to talk to those who are actually experts in the field.

  2. I thought the lecture was interesting. I used a neural net back in the early 1990’s to assess the sampled time signature of an accelerometer output to distinguish crash events form non-crash events for airbag deployment. I think ours had about 15 internal nodes, and we determined the weights in the classic way, by “training” it on crash events and non crash events. It worked quite well, but at that time the auto industry was allergic to the notion of non-deterministic algorithms. Essentially, the safety offices at the OEMs would not accept a safety system that was based on algorithms that could not be assessed analytically, but instead had to be assessed combinatorially, i.e., by example. At the time I was not aware of the clever approach of using “re-use” to simplify the analytical computation of the weights (I thought Winston’s description of the process was useful and interesting), and had we been aware of that (20 years prior to Jeff Hinton’s work), we might have been able to show analytically how the algorithm worked. Probably would have revolutionized airbag systems…

  3. I’m doing image classification with Deep Learning and getting fantastic results. It is quite powerful. As a control theorist (in the old days) it is a bit scary how
    there is little to no theoretical under-pinnings related to why or how they converge, and the issue local minima is just sort of brushed aside. But, in a practical sense, they seem to be a great tool.

  4. Unknowingly I picked up a pen and started taking notes (distinctly after 10 years) – the way it he helps build intuition is amazing.
    I’m jealous of the kids in the room who I presume would be hogging him after class, the graduate students who’d want to work with him for projects and/or research problem and this students.
    I started to search for all the lectures by him that are available on MIT open courseware – anything just anything. I would even lap up his astrology lecture if he was giving one. Thanks for the complete lecture list.

  5. Thanks for pointing out these videos — they are a great resource!

    One other, related, resource I’d like to point out (also from MIT!):
    Many Neural Net / Machine Learning frameworks offer the feature of automatically calculating your gradients for you, to save having to do all of these calculations by hand. A lot of people seem to treat this as some kind of black box, or simply act like it’s deep magic and wouldn’t know where to begin if they had to implement something like this themselves. I always love to point out that the great “Structure and Interpretation of Computer Programs” is available freely online, and has a wonderful section dedicated to showing how one can calculate derivatives, or do other algebra on symbolic expressions ( Food for thought for anyone who is using a Neural Network / ML framework, and wondering a bit more about what it does!

    An older, full set of lectures to complement the book is also available on OCW at ( Even though it’s an “Introductory” level course and book, I always find it fun to revisit, play with and ponder the ideas they presented. I’d be willing to bet even most seasoned programmers can find something interesting in there (And getting to play with scheme / lisp for a little bit is always fun!)

  6. The fact that Patrick Winston’s neurobiology is from the 1950s is one thing but the fact that anyone in Computer Science will somehow to relate “neural networks” (cringe!) to neuroscience is the biggest con presented in computer science. His relating these Nodal networks to neuroscience is completely false! Stop it!!!

    1. It is fine to get excited about this, but the fact is that this is where these ideas came from, starting with McCulloch and Pitts in 1943, and developing with Hebb in 1949, Widrow and Hoff in 1959, etc. Of course all that early neuroscience and modeling of it has proven to be wrong. But that is where these networks came from. Bad models inspired by early neuroscience that is very different from today’s neuroscience. Today’s deep learning is based on wrong models of neuroscience. No one is claiming otherwise, and no one is claiming that these networks are accurate models of neurons.

Comment on this

Your email address will not be published. Required fields are marked *