Machine learning plays a part in every stage of your life. If you studied online for the SAT college admission exam, a learning algorithm graded your practice essays. And if you applied to business school and took the GMAT exam recently, one of your essay graders was a learning system. Perhaps when you applied for your job, a learning algorithm picked your résumé from the virtual pile and told your prospective employer: here’s a strong candidate; take a look. Your latest raise may have come courtesy of another learning algorithm. If you’re looking to buy a house, Zillow.com will estimate what each one you’re considering is worth. When you’ve settled on one, you apply for a home loan, and a learning algorithm studies your application and recommends accepting it (or not). Perhaps most important, if you’ve used an online dating service, machine learning may even have helped you find the love of your life.
Society is changing, one learning algorithm at a time. Machine learning is remaking science, technology, business, politics, and war. Satellites, DNA sequencers, and particle accelerators probe nature in ever-finer detail, and learning algorithms turn the torrents of data into new scientific knowledge. Companies know their customers like never before. The candidate with the best voter models wins, like Obama against Romney. Unmanned vehicles pilot themselves across land, sea, and air. No one programmed your tastes into the Amazon recommendation system; a learning algorithm figured them out on its own, by generalizing from your past purchases. Google’s self-driving car taught itself how to stay on the road; no engineer wrote an algorithm instructing it, step-by-step, how to get from A to B. No one knows how to program a car to drive, and no one needs to, because a car equipped with a learning algorithm picks it up by observing what the driver does.
Machine learning is something new under the sun: a technology that builds itself. Ever since our remote ancestors started sharpening stones into tools, humans have been designing artifacts, whether they’re hand built or mass produced. But learning algorithms are artifacts that design other artifacts. “Computers are useless,” said Picasso. “They can only give you answers.” Computers aren’t supposed to be creative; they’re supposed to do what you tell them to. If what you tell them to do is be creative, you get machine learning. A learning algorithm is like a master craftsman: every one of its productions is different and exquisitely tailored to the customer’s needs. But instead of turning stone into masonry or gold into jewelry, learners turn data into algorithms. And the more data they have, the more intricate the algorithms can be.
Homo sapiens is the species that adapts the world to itself instead of adapting itself to the world. Machine learning is the newest chapter in this million-year saga: with it, the world senses what you want and changes accordingly, without you having to lift a finger. Like a magic forest, your surroundings—virtual today, physical tomorrow—rearrange themselves as you move through them. The path you picked out between the trees and bushes grows into a road. Signs pointing the way spring up in the places where you got lost.
These seemingly magical technologies work because, at its core, machine learning is about prediction: predicting what we want, the results of our actions, how to achieve our goals, how the world will change. Once upon a time we relied on shamans and soothsayers for this, but they were much too fallible. Science’s predictions are more trustworthy, but they are limited to what we can systematically observe and tractably model. Big data and machine learning greatly expand that scope. Some everyday things can be predicted by the unaided mind, from catching a ball to carrying on a conversation. Some things, try as we might, are just unpredictable. For the vast middle ground between the two, there’s machine learning.
Paradoxically, even as they open new windows on nature and human behavior, learning algorithms themselves have remained shrouded in mystery. Hardly a day goes by without a story in the media involving machine learning, whether it’s Apple’s launch of the Siri personal assistant, IBM’s Watson beating the human Jeopardy! champion, Target finding out a teenager is pregnant before her parents do, or the NSA looking for dots to connect. But in each case the learning algorithm driving the story is a black box. Even books on big data skirt around what really happens when the computer swallows all those terabytes and magically comes up with new insights. At best, we’re left with the impression that learning algorithms just find correlations between pairs of events, such as googling “flu medicine” and having the flu. But finding correlations is to machine learning no more than bricks are to houses, and people don’t live in bricks.
When a new technology is as pervasive and game changing as machine learning, it’s not wise to let it remain a black box. Opacity opens the door to error and misuse. Amazon’s algorithm, more than any one person, determines what books are read in the world today. The NSA’s algorithms decide whether you’re a potential terrorist. Climate models decide what’s a safe level of carbon dioxide in the atmosphere. Stock-picking models drive the economy more than most of us do. You can’t control what you don’t understand, and that’s why you need to understand machine learning—as a citizen, a professional, and a human being engaged in the pursuit of happiness.
This book’s first goal is to let you in on the secrets of machine learning. Only engineers and mechanics need to know how a car’s engine works, but every driver needs to know that turning the steering wheel changes the car’s direction and stepping on the brake brings it to a stop. Few people today know what the corresponding elements of a learner even are, let alone how to use them. The psychologist Don Norman coined the term conceptual model to refer to the rough knowledge of a technology we need to have in order to use it effectively. This book provides you with a conceptual model of machine learning.
Not all learning algorithms work the same, and the differences have consequences. Take Amazon’s and Netflix’s recommenders, for example. If each were guiding you through a physical bookstore, trying to determine what’s “right for you,” Amazon would be more likely to walk you over to shelves you’ve frequented previously; Netflix would take you to unfamiliar and seemingly odd sections of the store but lead you to stuff you’d end up loving. In this book we’ll see the different kinds of algorithms that companies like Amazon and Netflix use. Netflix’s algorithm has a deeper (even if still quite limited) understanding of your tastes than Amazon’s, but ironically that doesn’t mean Amazon would be better off using it. Netflix’s business model depends on driving demand into the long tail of obscure movies and TV shows, which cost it little, and away from the blockbusters, which your subscription isn’t enough to pay for. Amazon has no such problem; although it’s well placed to take advantage of the long tail, it’s equally happy to sell you more expensive popular items, which also simplify its logistics. And we, as customers, are more willing to take a chance on an odd item if we have a subscription than if we have to pay for it separately.
Hundreds of new learning algorithms are invented every year, but they’re all based on the same few basic ideas. These are what this book is about, and they’re all you really need to know to understand how machine learning is changing the world. Far from esoteric, and quite aside even from their use in computers, they are answers to questions that matter to all of us: How do we learn? Is there a better way? What can we predict? Can we trust what we’ve learned? Rival schools of thought within machine learning have very different answers to these questions. The main ones are five in number, and we’ll devote a chapter to each. Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization. Driven by the goal of building learning machines, we’ll tour a good chunk of the intellectual history of the last hundred years and see it in a new light.
Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine. In practice, however, each of these algorithms is good for some things but not others. What we really want is a single algorithm combining the key features of all of them: the ultimate master algorithm. For some this is an unattainable dream, but for many of us in machine learning, it’s what puts a twinkle in our eye and keeps us working late into the night.
If it exists, the Master Algorithm can derive all knowledge in the world—past, present, and future—from data. Inventing it would be one of the greatest advances in the history of science. It would speed up the progress of knowledge across the board, and change the world in ways that we can barely begin to imagine. The Master Algorithm is to machine learning what the Standard Model is to particle physics or the Central Dogma to molecular biology: a unified theory that makes sense of everything we know to date, and lays the foundation for decades or centuries of future progress. The Master Algorithm is our gateway to solving some of the hardest problems we face, from building domestic robots to curing cancer.
Take cancer. Curing it is hard because cancer is not one disease, but many. Tumors can be triggered by a dizzying array of causes, and they mutate as they metastasize. The surest way to kill a tumor is to sequence its genome, figure out which drugs will work against it—without harming you, given your genome and medical history—and perhaps even design a new drug specifically for your case. No doctor can master all the knowledge required for this. Sounds like a perfect job for machine learning: in effect, it’s a more complicated and challenging version of the searches that Amazon and Netflix do every day, except it’s looking for the right treatment for you instead of the right book or movie. Unfortunately, while today’s learning algorithms can diagnose many diseases with superhuman accuracy, curing cancer is well beyond their ken. If we succeed in our quest for the Master Algorithm, it will no longer be.
Excerpted from "The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World" by Pedro Domingos. Published by Basic Books, a division of the Perseus Books Group. Copyright © 2015 by Pedro Domingos. Reprinted with permission of the publisher. All rights reserved.