10 BEST BOOKS FOR DEEP LEARNING

I want to start by being honest about something: I came to deep learning from the psychology direction. I spent two years in a cognitive psychology PhD program studying decision fatigue, and somewhere in there I started reading about neural networks, and I found myself more interested in the analogies than the mathematics. The idea that you could build something that learned from experience — that adjusted its own connections based on what worked and what didn’t — seemed to me like it should tell us something about how human learning actually works, or doesn’t, or could.

Here’s what I keep thinking about, weeks after finishing most of these books: we keep naming these systems after human capabilities, and then we act surprised when the naming creates expectations. Deep learning learns. Neural networks learn. We say these things and then we’re confused when people think the systems understand in the way that word usually means. (Which, if you think about it, is either very profound or extremely obvious, and I’ve been going back and forth on that.)

Here’s the thing I keep thinking about, weeks after finishing most of these books: we keep naming these systems after human capabilities, and then we act surprised when the naming creates expectations. Deep learning learns. Neural networks learn. We say these things and then we’re confused when people think the systems understand in the way that word usually means.

The book is making a much bigger argument than the title suggests.

What I mean is: deep learning is remarkable, and it is also limited in ways that the popular descriptions tend to elide. Understanding those limitations — what these systems actually do, how they actually work, what they’re actually learning — is the difference between having an opinion about AI that holds up to scrutiny and having an opinion that falls apart the first time someone asks you a follow-up question.

The books on this list represent different angles on this territory. Some are technical and mathematical. Some are more conceptual. A few are explicitly about what this all means for human intelligence, which is the question I find myself returning to. I’ve tried to include something for every level of mathematical comfort, because the field has a tendency to forget that not everyone came through a computer science degree.

Quick Pick: The Best Book for Deep Learning

If you only have time for one book, go with “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurelien Geron. This is the most practical and complete guide to building machine learning systems that you’ll find, and Geron has the rare ability to explain complex concepts without sacrificing accuracy. I keep coming back to this one as a reference. The book is long enough to feel like it covers everything, and it is, more or less. If you’re going to read one book on deep learning, make it this one.

The 10 BEST BOOKS FOR DEEP LEARNING

HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW book cover

1. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW BY AURELIEN GERON

Paperback | Kindle

[AURELIEN GERON] | ⭐ 4.7/5

Who it’s for: Anyone who wants to actually build deep learning systems, as opposed to just understanding them. This is a practical book for people who learn by doing. If you want to go from zero to building working models, this is where you start.

Get it here: https://www.amazon.com/Hands-On-Machine-Learning-Scikit-Learn/dp/1492032646?tag=readplug09-20

“The goal of machine learning is to give computers the ability to learn from data without being explicitly programmed.”

Geron is a former Google engineer, and it shows. This book has the particular quality of being written by someone who has actually built these systems at scale, not just researched them in an academic context. The result is a book that is rigorous about the mathematics without ever letting the mathematics obscure the practical goal: building things that work.

I’ve used this as a reference for my own projects, and what I appreciate most is that Geron doesn’t oversimplify. When something is complicated, he says so. When a technique works better in practice than the theory explains, he says that too. The book is honest about what we know and what we don’t, which is more than you can say for most technical writing.

The second edition covers transformers, reinforcement learning, and other recent developments, which means it doesn’t feel dated the way some technical books do. If you’re going to own one book on this subject, this is probably it.

My take: The most useful book on the shelf. Not the most theoretical, but the most practical, and for this subject, practical is usually what you need.

2. DEEP LEARNING BY IAN GOODFELLOW, YOSHUA BENGIO, AND AARON COURVILLE

Paperback | Kindle

[IAN GOODFELLOW, YOSHUA BENGIO, AND AARON COURVILLE] | ⭐ 4.6/5

Who it’s for: People who want the comprehensive mathematical treatment. This is the closest thing the field has to a canonical textbook. If you’re coming from a mathematics or computer science background and you want to understand the foundations, this is where you go.

Get it here: https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618?tag=readplug09-20

“Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction.”

I should be honest: this is not a book you read casually. It’s a book you work through, which is a different thing. The mathematics is real and it’s not apologetic about it. But if you’re willing to put in the time, it will give you the deepest understanding of deep learning available in a single volume.

Bengio won the Turing Award for his work on deep learning, which should tell you something about the credibility of the authors. Goodfellow invented GANs (generative adversarial networks), which is one of the most interesting developments in the field. These are not people who are writing about something at a distance. They are writing about their own contributions to the field.

I found myself arguing with this book in places, which is usually a good sign. The authors have strong views about what matters and they’re not shy about expressing them. Sometimes I thought they were wrong. But they were always wrong in interesting ways that made me think more carefully about my own positions.

My take: The definitive textbook. Dense, comprehensive, and unforgiving. If you want to understand deep learning at the level of the people who built it, this is where you start.

NEURAL NETWORKS AND DEEP LEARNING book cover

3. NEURAL NETWORKS AND DEEP LEARNING BY MICHAEL NIELSEN

Paperback | Kindle

[MICHAEL NIELSEN] | ⭐ 4.5/5

Who it’s for: Anyone who wants a conceptual understanding without the full mathematical treatment. Nielsen is a physicist by training, and it shows in his approach: he wants you to understand why neural networks work, not just that they do.

Get it here: https://www.amazon.com/Neural-Networks-Deep-Learning-Textbooks/dp/1529386698?tag=readplug09-20

“A neural network is a type of computer program, inspired by the brain, that is designed to learn patterns from data.”

Here’s what I like about Nielsen’s approach: he builds intuition before mathematics. He uses visualizations and analogies that actually illuminate rather than obscure. The book is available free online, which is appropriate because it’s more of an educational resource than a commercial product.

I found this book useful as a bridge between the popular descriptions and the technical literature. After reading Nielsen, I went back to Goodfellow’s textbook with a better sense of what I was looking for. That’s a good test of an educational text: does it help you navigate the harder material?

The chapter on backpropagation — the algorithm that makes neural networks learn — is the best explanation I’ve found anywhere. Nielsen clearly spent a lot of time figuring out how to explain this clearly, and it shows.

My take: The best conceptual introduction. Not a substitute for the mathematical treatment, but an excellent preparation for it.

4. THE DEEP LEARNING REVOLUTION BY TERRENCE J. SEJNOWSKI

Paperback | Kindle

[TERRENCE J. SEJNOWSKI] | ⭐ 4.4/5

Who it’s for: People who want to understand the history and context of deep learning, not just the techniques. Sejnowski has been in the field since the beginning, and he knows things about the development of neural networks that aren’t in most accounts.

Get it here: https://www.amazon.com/Deep-Learning-Revolution-Terrence-Sejnowski/dp/026203803X?tag=readplug09-20

“Deep learning is not a magic wand that can be waved at any problem. It is a tool, and like any tool, it has strengths and limitations.”

Sejnowski is a pioneer in the field — he was doing neural network research in the 1980s, when the field was at one of its low points and everyone else had moved on to statistical methods. His account of what happened next is valuable precisely because he was there.

The book is part history, part memoir, and part technical explanation. The combination shouldn’t work, but it does, because Sejnowski has a sense of narrative and he uses it to structure ideas rather than to avoid them. The technical content is real. It’s just embedded in a story that makes it easier to follow.

What I found most interesting was the account of the back-and-forth between neural networks and statistical learning. For a while, statistical methods were dominant and neural networks were considered a dead end. Sejnowski explains why the statistical view was wrong — or rather, why it was right about the limitations of then-existing neural networks but wrong about what would become possible as the networks got deeper and the data got bigger. This history matters because it shows that the current dominance of deep learning isn’t inevitable — it was the result of specific technical advances and the availability of large datasets.

One thing I didn’t expect from this book: it’s funny in places. Sejnowski has a dry sense of humor that comes through occasionally, usually at the expense of the field’s more grandiose claims. I found myself laughing out loud at a few passages, which is not what I expected from a technical memoir. The humor is earned, not forced, and it makes the book more readable without making it less serious.

My take: The most historically informed book on this list. Essential context that most technical books skip.

5. GROKKING DEEP LEARNING BY ANDREW TRASK

Paperback | Kindle

[ANDREW TRASK] | ⭐ 4.3/5

Who it’s for: Beginners who want to build intuition before tackling the mathematics. Trask has a gift for explaining ideas in ways that don’t require a technical background. You don’t need to know calculus to understand this book.

Get it here: https://www.amazon.com/Grokking-Deep-Learning-Andrew-Trask/dp/1617293709?tag=readplug09-20

“Deep learning is about taking down a massive tower of intelligence, one floor at a time.”

I have a complicated relationship with the “Grokking” series. Some of them are excellent. Some feel like they’ve been written for an audience that doesn’t exist. Trask’s book is in the excellent category, partly because he’s writing about a subject he clearly understands deeply enough to explain simply.

What I appreciate is that Trask doesn’t condescend. He’s not avoiding the mathematics because he thinks you can’t handle it. He’s avoiding it because he’s found that building intuition first makes the mathematics easier to absorb later. That’s a pedagogical philosophy I can respect, even when I disagree with it, because it comes from a genuine desire to teach rather than a desire to impress.

The book builds up from simple neural networks to more complex ones, and by the end you’re implementing things that feel like they should be beyond you. That feeling — of having built something you didn’t think you could build — is underrated as a learning experience. The book gives you that feeling in a way that doesn’t cheat: you’re actually building the math from scratch, not just calling library functions. Which, if you think about it, is either very profound or extremely practical, and I’ve been going back and forth on which one matters more.

My take: The most accessible introduction. Not for people who want the full mathematical treatment, but for everyone else, this is where to start.

6. DEEP LEARNING WITH PYTHON BY FRANCOIS CHOLLET

Paperback | Kindle

[FRANCOIS CHOLLET] | ⭐ 4.6/5

Who it’s for: Python programmers who want to understand deep learning through code. Chollet is the creator of Keras, and it shows. If you think in Python, this book will make sense in a way that other books don’t.

Get it here: https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438?tag=readplug09-20

“Deep learning is about takingraw data and learning to represent it in ways that are useful for a specific task.”

Chollet has strong opinions about how deep learning should be taught, and I mostly agree with them. He thinks the field has become too focused on architecture-hacking at the expense of actual understanding, and he makes this point in the book without being preachy about it.

The Keras library is Chollet’s creation, and the book uses it to build intuition before diving into the underlying mathematics. This is a different pedagogical approach than Geron’s — Chollet is more interested in understanding what you’re doing and why, and less interested in covering every possible technique. Both approaches are valid. This one is better if you’re the kind of learner who needs to understand before you can build.

I found the sections on text processing and sequence models particularly strong. Chollet clearly thinks carefully about how to explain these concepts, and it shows.

My take: The best book for Python programmers. Chollet understands how programmers think, and he uses that understanding to teach.

MATHEMATICS FOR MACHINE LEARNING book cover

7. MATHEMATICS FOR MACHINE LEARNING BY MARC PETER DEISENROTH ET AL.

Paperback | Kindle

[MARC PETER DEISENROTH, A. ALDO FAISAL, AND CHENG SOON] | ⭐ 4.4/5

Who it’s for: People who want to fill in the mathematical background. This is not a deep learning book per se — it’s a mathematics book that teaches the specific linear algebra, probability, and calculus you need for machine learning.

Get it here: https://www.amazon.com/Mathematics-Machine-Learning-Creative-Compressions/dp/110845514X?tag=readplug09-20

“Mathematics is the language we use to describe the world around us.”

I spent a lot of time in my PhD program being frustrated by the gap between the mathematics I knew and the mathematics I needed. This book is an attempt to close that gap, specifically for people working in machine learning.

What I appreciate is that the book is honest about what it’s trying to do. It’s not a rigorous mathematical text — it doesn’t prove theorems or develop arguments with full formality. It’s a bridge between the mathematics you learned in courses and the mathematics you need to understand machine learning papers.

The sections on linear algebra and probability are particularly good. The authors have thought carefully about what people actually need to know, as opposed to what appears in standard courses, and they’ve made choices accordingly.

I found myself using this as a reference more than reading it straight through. That’s probably the right use for it: a reference for the specific mathematical concepts you need, when you need them.

My take: The most useful reference on this list. Not a book you read once, but a book you keep coming back to.

8. DEEP LEARNING ILLUSTRATED BY JON KROHN

Paperback | Kindle

[JON KROHN] | ⭐ 4.2/5

Who it’s for: Visual learners who want diagrams and intuitive explanations. Krohn has a graphics-heavy approach that some people find extremely useful and others find distracting.

Get it here: https://www.amazon.com/Deep-Learning-Illustrated-Intelligence-Addison-Wesley/dp/0135116694?tag=readplug09-20

“Deep learning is essentially a powerful pattern recognition algorithm.”

Here’s what I’ll say about this book: the visuals are genuinely useful. Neural networks are complicated systems, and being able to see what information flows where, how gradients propagate, how layers connect — that stuff helps. If you’re the kind of learner who benefits from diagrams, this book was designed for you.

The text is less impressive than the visuals. Krohn covers a lot of ground without always going deep, and some of the explanations feel like they’re skating over complexity rather than engaging with it. But the visual approach saves it.

I used this book alongside Nielsen’s when I was first learning. The combination worked: Nielsen gave me the conceptual foundation, and Krohn gave me the visual scaffolding to hang the concepts on. Neither book is sufficient alone. Together, they’re pretty good.

My take: The most visual introduction. Not the deepest, but if diagrams work for you, this is worth your time.

9. MAKE YOUR OWN NEURAL NETWORK BY TARIQ RASHID

Paperback | Kindle

[TARIQ RASHID] | ⭐ 4.3/5

Who it’s for: Complete beginners, especially those who are intimidated by mathematics. Rashid has worked hard to make this accessible, and he’s succeeded in ways that most technical authors don’t even try.

Get it here: https://www.amazon.com/Make-Own-Neural-Network-Tariq-Rashid/dp/1530826608?tag=readplug09-20

“The aim of this book is to make neural networks completely accessible to anyone with some basic high school mathematics.”

I want to be honest: this book is probably too simple for most people reading this list. Rashid is explicitly targeting beginners, and if you’re already a programmer or you’ve already read one of the other books on this list, you’ll find much of this familiar.

But here’s the thing: the people who most need an accessible introduction often don’t know they need one. They think they should already understand, and they don’t, and they feel embarrassed about asking. This book is for them. Rashid doesn’t make you feel stupid for not knowing. He starts from the beginning and he doesn’t assume anything.

I found myself impressed by the pedagogical care. Rashid has clearly thought about what trips people up, and he’s tried to address it. The step-by-step approach to building a neural network from scratch is particularly good.

My take: The most accessible book on this list. Not for everyone, but if you’re struggling with the other books, this one might be where you belong.

10. THE BOOK OF WHY BY JUDEA PEARL AND DIANA MCDOUGALL

Paperback | Kindle

[JUDEA PEARL AND DIANA MCDOUGALL] | ⭐ 4.5/5

Who it’s for: People who want to understand the fundamental limits of deep learning and what comes next. Pearl is one of the most important figures in artificial intelligence, and this is his most accessible statement of what he thinks is missing from current AI systems.

Get it here: https://www.amazon.com/Book-Why-Science-Causality-Pearl/dp/0141982411?tag=readplug09-20

“Correlation is not causation.” But correlation is all that deep learning can see.

I saved this one for last because it’s the most philosophical, and because I think it’s the most important for understanding the current moment in AI. Pearl won the Turing Award for his work on causal inference, and in this book he argues that current deep learning systems are fundamentally limited because they can find correlations but they can’t reason about causation.

Here’s what I keep thinking about, weeks after finishing this book: we keep building systems that are incredibly good at pattern recognition and incredibly bad at everything else, and then we act surprised when those systems fail in ways that seem obvious in retrospect. Pearl’s argument suggests that this isn’t a bug we can fix with better pattern recognition. It’s a fundamental limitation of the approach.

I found myself arguing with this book in places. Pearl has strong views about what matters, and he’s not shy about expressing them. But even when I disagreed, I thought more carefully about why I disagreed, which is the best I can ask of any book.

My take: The most important perspective on this list. Not a technical book, but essential reading for understanding what deep learning can and can’t do.

FREQUENTLY ASKED QUESTIONS

WHAT EXACTLY IS DEEP LEARNING?

Deep learning is a subset of machine learning that uses neural networks with multiple layers — hence “deep” — to learn representations of data at multiple levels of abstraction. The “depth” refers to the number of layers: shallow networks might have one or two layers, while deep networks can have dozens or hundreds. The key insight is that each layer learns to represent data in terms of the patterns found in the previous layer, building up from simple features to complex concepts. This is similar to how the visual cortex processes information: early layers detect edges, intermediate layers detect shapes, and higher layers detect objects.

DO I NEED TO BE A MATHEMATICIAN TO UNDERSTAND DEEP LEARNING?

No, but you need some mathematics. The minimum mathematical background for understanding deep learning at a working level is linear algebra (matrix operations, vectors, eigenvalues), probability and statistics, and some calculus (specifically, understanding gradients and derivatives). The good news is that you don’t need to be a mathematician — you need to understand these topics at an applied level, not at the level of formal proofs. Books like “Mathematics for Machine Learning” by Deisenroth are designed specifically to bridge this gap.

WHAT’S THE DIFFERENCE BETWEEN DEEP LEARNING AND MACHINE LEARNING?

Deep learning is a specific approach within machine learning. Traditional machine learning relies on feature engineering — humans decide what features of the data are important, and then the algorithm learns to classify or predict based on those features. Deep learning automates the feature engineering process by learning hierarchical representations directly from raw data. This makes deep learning particularly powerful for unstructured data like images, text, and audio, where it’s difficult for humans to specify relevant features in advance.

CAN DEEP LEARNING SYSTEMS ACTUALLY “UNDERSTAND” WHAT THEY’RE LEARNING?

No, and this is one of the most important limitations to understand. Deep learning systems learn statistical patterns in data. They can recognize that certain patterns of pixels tend to correspond to cats, but they don’t have any understanding of what a cat is in the way that you or I understand what a cat is. They don’t know that cats are animals, that they’re typically furry, that they purr, that they have owners, or any of the vast amount of common sense knowledge that we associate with understanding. This is the distinction Pearl is making in “The Book of Why” between correlation and causation, between pattern recognition and actual reasoning.

WHAT ARE TRANSFORMERS AND WHY DO THEY MATTER?

Transformers are a type of neural network architecture introduced in 2017 that has become dominant in natural language processing and is increasingly important in other domains. The key innovation is “attention” — a mechanism that allows the network to focus on the most relevant parts of the input when producing output. This sounds simple, but it has profound implications. Transformers can process sequences in parallel (unlike RNNs, which had to process sequentially), which makes them much faster to train. They also scale better to large datasets and large models. Most of the AI systems you’ve heard about in recent years — GPT, BERT, etc. — are based on transformer architectures.

HOW LONG DOES IT TAKE TO LEARN DEEP LEARNING?

It depends on your background and what you mean by “learn.” You can build working models with tools like Keras or PyTorch in a few weeks if you’re already a programmer. Understanding why those models work — and why they sometimes don’t — takes longer. If you want to contribute to research or build cutting-edge systems, you’re looking at years of study. Most of the books on this list will take you several months to work through properly. The field is moving quickly, so learning is ongoing rather than something you complete.

WHAT ARE THE MAIN LIMITATIONS OF DEEP LEARNING THAT THE BOOKS DON’T ALWAYS MENTION?

Deep learning systems are remarkably good at pattern recognition and remarkably bad at things that seem obvious to humans. They require enormous amounts of labeled data to train, they can be fooled by adversarial examples that would never fool a human, and they don’t generalize well to situations that differ significantly from their training data. They also have a “black box” problem: it’s often difficult to understand why a deep learning system made a particular decision, which matters enormously in high-stakes applications like healthcare or criminal justice. Most importantly, deep learning systems lack the causal reasoning capabilities that Pearl discusses in “The Book of Why.” They can tell you what is correlated with what, but not what causes what. Understanding these limitations is essential for anyone who wants to form an informed opinion about what AI can and cannot do.

THE BOTTOM LINE

The books on this list represent different entry points into a field that is changing fast and generating a lot of heat alongside its light. If you’re starting from scratch and you want one book, make it Geron’s “Hands-On Machine Learning.” If you want the conceptual foundation, read Nielsen’s “Neural Networks and Deep Learning” first, then move to Goodfellow’s textbook for the mathematics. If you want to understand what deep learning can’t do — and you should want to understand this — read Pearl’s “The Book of Why.”

Here’s what I keep coming back to: deep learning is remarkable, and it’s also limited in ways that the hype tends to obscure. These systems can learn to recognize patterns with superhuman accuracy. They cannot reason about those patterns in the way that even a child can. Understanding both what these systems can do and what they can’t is the minimum necessary for having an informed opinion about AI. The books on this list will get you there, if you’re willing to put in the time.

I spent two years in a cognitive psychology program trying to understand how humans learn. What I’ve concluded from that experience and from these books is that we’re both more similar to and more different from these systems than the popular accounts suggest. We learn from experience in ways that are genuinely similar to how neural networks learn. But we also reason about what we’ve learned in ways that current systems don’t, and we have common sense knowledge about the world that no current system has. The question of whether we can build systems that have what we’re missing — and whether we should — is one that these books can’t answer. They’re the best preparation I’ve found for thinking about it.

Which book are you starting with?

Disclosure: This post contains affiliate links. If you purchase through these links, ReadPlug may earn a small commission at no extra cost to you. We only recommend books we’ve personally found valuable.

Quick Pick: The Best Book for Deep Learning

The 10 BEST BOOKS FOR DEEP LEARNING

1. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW BY AURELIEN GERON

2. DEEP LEARNING BY IAN GOODFELLOW, YOSHUA BENGIO, AND AARON COURVILLE

3. NEURAL NETWORKS AND DEEP LEARNING BY MICHAEL NIELSEN

4. THE DEEP LEARNING REVOLUTION BY TERRENCE J. SEJNOWSKI

5. GROKKING DEEP LEARNING BY ANDREW TRASK

6. DEEP LEARNING WITH PYTHON BY FRANCOIS CHOLLET

7. MATHEMATICS FOR MACHINE LEARNING BY MARC PETER DEISENROTH ET AL.

8. DEEP LEARNING ILLUSTRATED BY JON KROHN

9. MAKE YOUR OWN NEURAL NETWORK BY TARIQ RASHID

10. THE BOOK OF WHY BY JUDEA PEARL AND DIANA MCDOUGALL

FREQUENTLY ASKED QUESTIONS

WHAT EXACTLY IS DEEP LEARNING?

DO I NEED TO BE A MATHEMATICIAN TO UNDERSTAND DEEP LEARNING?

WHAT’S THE DIFFERENCE BETWEEN DEEP LEARNING AND MACHINE LEARNING?

CAN DEEP LEARNING SYSTEMS ACTUALLY “UNDERSTAND” WHAT THEY’RE LEARNING?

WHAT ARE TRANSFORMERS AND WHY DO THEY MATTER?

HOW LONG DOES IT TAKE TO LEARN DEEP LEARNING?

WHAT ARE THE MAIN LIMITATIONS OF DEEP LEARNING THAT THE BOOKS DON’T ALWAYS MENTION?

THE BOTTOM LINE

Related Posts

10 BEST BOOKS ABOUT INSECTS AND INVERTEBRATES FOR CURIOUS READERS

10 BEST BOOKS ABOUT INSECTS AND INVERTEBRATES FOR CURIOUS READERS

Trending now