Algorithms to Live By

algorithms-to-live-by-the-computer-science-of

The computer science of human decisions.

In this eye-opening book, each fascinating algorithm turns the wisdom of computer science into strategies for human living.

Introduction
- If you want the best odds of getting the best apartment, spend 37% of your apartment hunt (eleven days, if you’ve given yourself a month for the search) noncommittally exploring options.
- But after that point, be prepared to immediately commit — deposit and all — to the very first place you see that beats whatever you’ve already seen.
- Finding an apartment belongs to a class of mathematical problems known as “optimal stopping” problems.
- Optimal stopping is the science of serial monogamy.
- The balance between impulsivity and overthinking is 37%.
- Algorithm have been a part of human technology ever since the Stone Age.
- We explore the idea of human algorithm design — searching for better solutions to the challenges people encounter every day.
- Optimal stopping tells us when to look and when to leap.
- The explore/exploit tradeoff tells us how to find the balance between trying new things and enjoying our favourites.
- Sorting theory tells us how (and whether) to arrange our offices.
- Caching theory tells us how to fill our closets.
- Scheduling theory tells us how to fill our time.
- Tackling real-world tasks requires being comfortable with chance, trading off time with accuracy, and using approximations.
- Living by the wisdom of computer science doesn’t sound so bad after all. And unlike most advice, it’s backed up by proofs.
Optimal Stopping – When to Stop Looking
- In any optimal stopping problem, the crucial dilemma is not which option to pick, but how many options to even consider.
- The 37% Rule derives from optimal stopping’s most famous puzzle, which has come to be known as the “secretary problem.”
- Imagine you’re interviewing a set of applicants for a position as a secretary, and your goal is to maximize the chance of hiring the single best applicant in the pool.
- While you have no idea how to assign scores to individual applicants, you can easily judge which one you prefer.
- The look-then-leap rule: you set a predetermined amount of time for “looking” — that is, exploring your options, gathering data — in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the loop phase.
- As it turns out, following this optimal strategy ultimately gives us a 37% chance of hiring the best applicant; it’s one of the problem’s curious mathematical symmetries that the strategy itself and its chance of success work out to the very same number.
- Variants
  - Full information
    - No buildup of experience is needed to set a standard
    - The decision of whether to stop comes down entirely to how many applications we have left to see.
    - Threshold rule – we immediately accept an applicant if she is above a certain percentile.
    - The chance of ending up with the single best applicant in this full-information version of the secretary problem comes to 58%.
  - When to sell
    - Similar to the full-information game.
    - But given that waiting has a cost measured in dollars, a good offer today beats a slightly better one several months from now.
  - When to park
    - Parking is an optimal stopping problem.
    - The distance at which to switch from looking to leaping depends on the proportion of spots that are likely to be filled — the occupancy rate.
  - When to quit (a.k.a. burglar problem)
    - The number of robberies you should carry out is roughly equal to the chance you get away, divided by the chance you get caught.
Explore/Exploit – The Latest vs. the Greatest
- Exploration is gathering information, and exploitation is using the information you have to get a known good result.
- A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle. The flip side is that the value of exploitation can only go up over time.
- So explore when you will have time to use the resulting knowledge, exploit when you’re ready to cash in. The interval makes the strategy.
- Win-stay, lose-shift algorithm
- Gittins index – always go with the highest index
  - “Win-stay, lose-not-necessarily-shift” algorithm
  - based on geometric discounting of future reward
- The explore/exploit tradeoff changes as we change the way we’re discounting the future.
- Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty.
- Another option: focus on regret
- Upper confidence bound algorithm: algorithm that offer the guarantee of minimal regret.
  - Pick the option for which the top of the confidence interval is highest.
  - It naturally inject a dose of exploration into the decision-making process, leaping at new options which enthusiasm because any one of them could be the next big thing.
- When the world can change, continuing to explore can be the right choice.
- It’s rational to emphasize exploration — the new rather than the best, the exciting rather than the safe, the random rather than the considered — for many of those choices, particularly earlier in life.
Sorting – Making Order
- Sorting involves steep diseconomies of scale, violating our normal intuitions about the the virtues of doing things in bulk.
- Bubble sort and insertion sort – O(n^2)
- Mergesort – O(n log n)
- Bucket sort – O(nm), the key to actually breaking the linearithmic barrier is knowing the distribution from which the items you’re sorting are drawn.
- The effort expended on sorting materials is just a preemptive strike against the effort it’ll take to search through them later.
- Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.
Caching – Forget About It
- Memory hierarchy – each of which has greater capacity than the preceding but which is less quickly accessible.
- Cache replacement / cache eviction
  - Random Eviction
  - First-In, First-Out (FIFO)
  - Least Recently Used (LRU)
- If you follow the LRU principle – where you simply always put an item back at the very front of the list – then the total amount of time you spend searching will never be more than twice as long as if you’d known the future.
- “Cognitive decline” – lags and retrieval errors – may not be about the search process slowing or deteriorating, but (at least partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger.
Scheduling – First Things First
- If you have only a single machine, and you’re going to do all of your tasks, then any ordering of the tasks will take you the same amount of time.
- The first lesson in single-machine scheduling literally before we even begin: make your goals explicit.
- Earliest Due Date Algorithm
  - Minimizing maximum lateness
  - Start with the task due soonest and work your way toward the task due last.
- Moore’s Algorithm
  - Minimize the number of late tasks
  - Tossing the largest already scheduled item any time we fall behind.
- Shortest Processing Time
  - Minimizing the sum of completion times.
  - Always do the quickest task you can.
- Priority inheritance: if a low-priority task is found to be blocking a high-priority resource, then that low-priority task should momentarily become the highest-priority thing on the system, “inheriting” the priority of the things it’s blocking.
- Two principles in real-time scheduling
  - Responsiveness: how quickly you can respond to things
  - Throughput: how much you can get done overall
- If you find yourself doing a lot of context switching because you’re tackling a heterogeneous collection of short tasks, you can also employ another idea from computer science: “interrupt coalescing.”
Bayes’s Rule – Predicting the Future
- Laplace’s Law: for any possible drawing of w winning tickets in n attempts, the expectation is simply the number of wins plus one, divided by the number of attempts plus two – (w+1)/(n+2)
- When you do have some estimate of prior probabilities, Bayes’s Rule applies to a wide range of prediction problems, be they of the big-data variety or the more common small-data sort.
- Copernican Principle is exactly what results from applying Bayes’s Rule using what is known as an uninformative prior.
- For any power-law distribution, Bayes’s Rule indicates that the appropriate prediction strategy is a Multiplicative Rule: multiply the quantity observed so far by some constant factor.
- When we apply Bayes’s Rule with a normal distribution as a prior, we obtain a very different kind of guidance. Instead of a multiplicative rule, we get an Average Rule: use the distribution’s “natural” average – its single, specific scale – as your guide.
- The Erlang distribution gives us a third kind of prediction rule, the Additive Rule: always predict that things will go on just a constant amount longer.
Overfitting – When to Think Less
- It is indeed true that including more factors in a model will always, by definition, make it a better fit for the data we have already. But a better fit for the available data does not necessarily mean a better prediction.
- Cross-validation means assessing not only how well a model fits the data it’s given, but how well it generalizes to data it hasn’t seen.
- We must balance our desire to find a good fit against the complexity of the models we use to do so.
- If we introduce a complexity penalty, then more complex models need to do not merely a better job but a significantly better job of explaining the data to justify their greater complexity.
- Many prediction algorithms, for instance, start out by searching for the single most important factor rather than jumping to a multi-factor model. Only after finding that first factor do they look for the next most important factor to add to the model, then the next, and so on. Their model can therefore be kept from becoming overly complex simply by stopping the process short, before overfitting has had a chance to creep in.
Relaxation – Let It Slide
- There are entire classes of problems where a perfect solution is essentially unreachable, no matter how fast we make our computers or how cleverly we program them.
- A problem we don’t know how to solve in polynomial time is considered “intractable.”
- Constraint Relaxation: make the problem temporarily easier to handle before bringing it back to reality.
- If you can’t solve the problem in front of you, solve an easier version of it — and then see if that solution offers you a starting point, or a beacon, in the full-blown problem. Maybe it does.
- Discrete optimization’s commitment to whole numbers is what makes discrete optimization problems so hard to solve.
- Continuous Relaxation turns discrete or binary choices into continua.
- Occasionally it takes a bit of diplomatic finesse, but a Lagrangian Relaxation — where some impossibilities are downgraded to penalties, the inconceivable to the undesirable — enables progress to be made.
Randomness – When to Leave It to Chance
- There are cases where randomized algorithms can produce good approximate answers to difficult questions faster than all known deterministic algorithms.
- There will always be some error associated with a sampling process, though you can reduce it by ensuring your samples are indeed random and by taking more and more of them.
- Monte Carlo Method: replacing exhaustive probability calculations with sample simulations.
- Whether it’s jitter, random restarts, or being open to occasional worsening, randomness is incredibly useful for avoiding local maxima.
- Being randomly jittered, thrown out of the frame and focused on a larger scale, provides a way to leave what might be locally good and get back to the pursuit of what might be globally optimal.
Networking – How We Connect
- Exponential Backoff: doubling the potential delay before try to transmit again.
- AIMD (Additive Increase, Multiplicative Decrease): any fully received batch of packets causes the number of packets in flight not to double but merely to increase by 1, and dropped packets cause the transmission rate to cut back by half.
- Tail Drop: every packet arriving after that point is simply rejected, and effectively deleted.
- The most prevalent critique of modern communications is that we are “always connected.” But the problem isn’t that we’re always connected; we’re not. The problem is that we’re always buffered. The difference is enormous.
Game Theory – The Minds of Others
- The mathematician John Nash proved in 1951 that every two-player game has at least one equilibrium.
- The equilibrium for a set of players, all acting rationally in their own interest, may not be the outcome that is actually best for those players.
- If the rules of the game force a bad strategy, maybe we shouldn’t try to change strategies. Maybe we should try to change the game.
  - What rules will give us the behavior we want to see?
- A change to the game’s payoffs that doesn’t change the equilibrium will typically have a much smaller effect than desired.
- Algorithmic game theory gives us a way to rethink mechanism design: to take into account not only the outcome of the games, but also the computational effort required of the players.
- In the Vickrey auction, honesty is literally the best policy.
- If changing strategies doesn’t help, you can try to change the game. And if that’s not possible, you can at least exercise some control about which games you choose to play. The road to hell is paved with intractable recursions, bad equilibria, and information cascades. Seek out games where honesty is the dominant strategy. Then just be yourself.
Conclusion – Computational Kindness
- There are cases where computer scientists and mathematicians have identified good algorithmic approaches that can simply be transferred over to human problems.
- Knowing that you are using an optimal algorithm should be a relief even if you don’t get the results you were looking for.
- If you wind up stuck in an intractable scenario, remember that heuristics, approximations, and strategic use of randomness can help you find workable solutions.