Foundations

Deep learning

A branch of machine learning that uses multi-layered neural networks to learn complex patterns; powers modern vision, speech, and language AI systems.

Deep learning is the subfield of machine learning that builds neural networks with many stacked layers of representation. Each layer transforms its input into a slightly more abstract feature space, and the composition of these transformations lets the model capture patterns that are out of reach for classical algorithms — edges to shapes to objects in vision, characters to words to meaning in language.

The modern era began around 2012, when a deep convolutional network (AlexNet) crushed the ImageNet image-classification benchmark. The recipe — large labelled datasets, GPU compute, and end-to-end training with backpropagation — generalised quickly to speech recognition, machine translation, game-playing, and finally generative models. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton received the 2018 Turing Award for the foundational work.

Deep learning underlies almost everything covered on this site. The transformer architecture and the large language models that grow out of it are deep learning systems with billions to trillions of parameters. Image generators, speech models, protein-structure predictors, and self-driving perception stacks all share the same underlying principle: stack differentiable layers, train with gradient descent, and let scale do much of the work.

Sources

See also