From Robert Caro’s The Power Broker:
On July 15, [1936,] La Guardia had [agreed to let Moses tear down the 92nd St Ferryhouse to extend the East River Drive] – although the Mayor said that ferry service should not be stopped for sixty days so that the ferry riders would have time to find other means of transportation.
Moses was not willing to wait sixty days. Defying the Mayor, he decided to stop the service immediately – by tearing down the ferry terminal.
He ordered the contractor building the East River Drive approach, A. N. Hazell, to procure two barges and install a pile driver on one and a wrecking crane on the other. When they were ready, on July 21, he waited until the Rockaway had unsuspectingly pulled away from Manhattan for one of its early-afternoon trips to Astoria and then, without warning, ordered the barges towed into the ferry slip and lashed together there so that the Rockaway would have no place to dock when it returned. And he ordered the pile driver and the crane to pound and pull the slip to pieces. Attacking simultaneously from the land side, he dispatched crews of workmen to tear up York Avenue’s cobblestones in front of the ferryhouse to cut off all access to the terminal by land.
[Deputy Commissioner Andrew] Hudson could hardly believe what was happening. … Hundreds of regular ferry commuters, who had come to Manhattan by boat that morning, would be left stranded there – with their cars and home on the other side of the river. But when, assuming there had been some mistake, he pointed this out to the contractors and asked them to stop work until matters could be straightened out, they told him that they had orders from Moses to continue working no matter who tried to stop them – and even while Hudson was arguing with them, the pile driver continued to smash away at the dock.
La Guardia meanwhile contacted Moses and pleaded with him to call the contractors off. If the demolition stopped, he promised the Park Commissioner, he would call the Sinking Fund Commission back into session to shorten the sixty-day waiting period. Moses refused. Telephoning Police Commissioner Lewis J. Valentine, the Mayor – in a voice so choked with fury that it was barely coherent – ordered him to send a squad of patrolmen to the scene.
Expecting the contractors to yield to a direct police order, the Mayor and Valentine had instructed the officer in charge at the scene, Deputy Chief Inspector Edward A. Bracken, not to use force. But the contractors knew whose orders they would have to follow if they wanted to keep working for Triborough. When the Inspector ordered contractor Hazell to quit, Hazell said, “I’m not going to.” Moses had told him not to, he said. And the pile driver continued to smash away. By this time, much of the dock had been destroyed and the hundreds of rush-hour ferry passengers standing on York Avenue were watching the workmen on the barges pry away heavy boards that had been loosened on the rest of it. Only force would stop Moses and La Guardia finally realized it. Put through to Bracken on the telephone, the Mayor ordered the Inspector to “drag those men off the boats if they don’t quit.” Those of the workmen who didn’t drop their tools when a fifteen-man police boarding party swarmed over the side had them struck from their hands and were shoved, along with the rest, onto what was left of the dock. Within minutes, a police launch was standing alongside the barges and when a launch carrying the night shift of contractors’ men appeared, it was warned away. A city tugboat pushed the contractor’s barges away downriver. As night was falling, a procession of barges bearing piles and lumber to repair the slip and workmen worked all night under floodlights to rebuild the terminal and repave York Avenue. By morning, the Rockaway was back in service – under the protection of a squad of patrolmen – and La Guardia could tell reporters, with a smile of satisfaction, “All is quiet on the eastern front”.
But the last smile was Moses’. La Guardia waited a week for the story to fade from the front pages and then quietly had the Sinking Fund Commission transfer the terminal to Moses in another forty-eight hours. At 11:25pm, July 31, … as the old tub left for the Brooklyn pier where she was to be laid up, her captain blew three long, dolorous whistle blasts of farewell. Hardly had the last note faded when it was succceeded by dull heavy thuds – the pound of Moses’ pile driver, tearing the ferryhouse down again.
Grand visions of holiday productivity, including wrapping up all outstanding book notes, were predictably optimistic. Continuing on:
Boomerang: Travels in the New Third World by Michael Lewis explains the European financial crisis as experienced by Iceland, Greece, Ireland, Germany, and the United States. Lewis is at his best when he can focus on a few key people and develop richly constructed characters who serve as personifications of larger ideas. Here, though, with each chapter built around a single whole country, the scope of the subject and the brevity of the treatment reduces much of his human explorations to thin stereotyping. The best chapter, on Greece, illustrates the dichotomy directly, where weak potshots at the whole Greek people pepper an otherwise funny and intriguing profile of a sect of monks improbably at the center of that country’s ills. If you keep up with Lewis’s Vanity Fair writing, there’s not much new here.
Six Easy Pieces is a selection of some of the more generally accessible chapters from Richard Feynman’s Lectures on Physics, focused on the intuition over the math. Feynman has a gift for cutting through the bullshit, and here he offers the clearest, least mystical lay explanation of quantum theory that I’ve read.
CODE by Charles Petzold builds a computer starting from the underlying idea of basic code, moving from the telegraph system to electronic circuits to graphical operating systems. If computers seem like magic boxes to you, or if you need a refresher on your freshman Computer Engineering course, this is a great book, though at times it gets bogged down in details that should have been left in the manual.
One of my favorite reads of 2011 was James Gleick’s The Information, so last year I picked up Chaos from his back catalog. The book zips its way through the history and ideas of chaos theory, covering the butterfly effect, fractals, strange attractors, universality, and dynamical systems with more restraint and grounding in reality than you’d maybe expect but more hype than you’d probably like. By the time Gleick declares chaos Kuhn-tastically revolutionary, you’re ready to dive right in to another book that tells you where these ideas have actually gone in the intervening 25 years. A solid read nonetheless.
Aaron Swartz killed himself yesterday, 2 years after being arrested for illegally downloading journal articles from JSTOR on the MIT campus, and 4 months after prosecutors filed a superseding indictment that expanded the charges from four to thirteen felony counts. I didn’t know Aaron personally except for a small email exchange, but I always admired his work and spark from a distance. His loss hurts, and this post is surely in part borne out of that hurt, but the thoughts are not new.
The prosecutors in this case, Carmen M. Ortiz, Scott L. Garland, and Stephen P. Heymann, deserve to be shamed. I do not blame them for Aaron’s death; that was his own tragic doing. But I emphatically blame them for pressing these charges in the manner they did, threatening Aaron with decades in jail and millions of dollars in fines for a non-violent offense against two private parties under a law that another federal circuit, the 9th, has said should be adjudicated in civil court. JSTOR has publicly stated that their legal interest ended when they got their documents back, and the damage to MIT was a few days inconvenience. Where is the compelling public interest that spurred the government to be at all involved? What possible reading of the crime demands impoverishment and a life in federal prison for justice? The pursuit of this case by these prosecutors was, plainly and simply, disgusting.
It’s hard to decide which larger issue to focus on. Society’s witch-hunt reaction to crimes involving computers? The wrong-headed incentives that drive prosecutors? The dying industries that keep throwing money at politicians to keep these laws on the books? Surely all of these and more are a part of this conversation, but it’s all been said and will continue to be said. Right now we just sit back, exhausted, and mourn the wholly unnecessary loss of a brilliant and inspiring individual.
Rest in peace, Aaron.
I’ve fallen hopelessly behind in my book reports, so I’m going to dribble out the backlog in short rundowns by the end of the year:
The Selfish Gene by Richard Dawkins is a book that I’ve been putting off for years. The evolution “debate” has always struck me as pointless, and for whatever reason I had thought that’s what this book was about, but it’s only given a brief dismissal in an early chapter and then ignored entirely in favor of good, meaty, mind-bending capital-S Science. Dawkins posits that the gene, as opposed to the whole individual organism or the species, is the unit of evolution, and walks through this idea’s broad explanatory power while developing an intuition for game theory along the way. The 30th Anniversary edition that I read includes copious footnotes explaining how intervening research has reinforced or occasionally refuted the original text. Highly recommended.
The Social Animal is David Brooks’ attempt to explain the inner workings of “wonderfully fulfilling lives”. The book follows a fictional couple, Harold and Erica, from cradle to grave, placing them on a rhetorical treadmill where each phase of their lives occurs circa 2010. The approach serves as a sort of manipulable petri dish for the ideas and policies Brooks wants to explore: How can poor children beat the odds and become successful adults? What do the ancient Greeks have to teach us today? How does the mind work? How do relationships form and evolve? What does it all mean? We see the questions and answers play out in the lives of his characters. The scope is ambitious, but Brooks isn’t up to the task, wasting large swathes on cringe-inducing caricatures (“crude-talking, hard-partying, cotton candy lipstick-wearing, thong-snapping, balls-to-the-wall disciple in the church of Lady GaGa”), dropping lame hints at a spirituality he’s too timid to talk about directly, stumping for the return of the Whig party, or constructing odd scenarios that just ring false (one example: Harold longs to have children, but after decades of resentment at Erica’s refusal to even discuss the topic, he finds that being a camp counselor one summer is just as good). I continue to be oblivious to what people see in David Brooks.
A Mathematician’s Lament by Paul Lockhart is a short exhortation for a curriculum focused on real mathematics rather than rote mechanics. For anyone as interested in applications as theory, Lockhart’s insistence on teaching pure math solely for its own sake might be a little off-putting, but the examples of what such a curriculum might actually look like are beautiful and inspiring, and make his follow-up Measurement a must read.
The behavior and requirements of distributed data storage systems are frequently counterintuitive (eventual consistency, anyone?). One of these counterintuitive behaviors is something I call “the extra node paradox”: if you add an extra node to your cluster, you actually increase your likelihood of data loss. This follows from two simple observations:
Say you have a cluster of n nodes, with each data block replicated to r of those. Then
For reasonable configurations, any r-node simultaneous failure results in data loss.
There are only (n choose r) possible ways to replicate a data block in a cluster. This number is probably far smaller than the number of data blocks that you’re storing, which implies that if your data is evenly distributed across the available replications, any r-node loss will hit some blocks replicated to only those r nodes. To make it concrete: say you have a 20 node cluster with 3TB storage per node, a block size of 512kB, and a replication factor of 3, so that you can store 1TB (~2m blocks) per node, or about 40m blocks total. The number of unique replications available is only (20 choose 3)=1140, implying a loss of roughly 35k blocks for any 3 node failure. Holding everything else constant, if you bump the cluster size to 200 nodes, you jump to 1,313,400 unique replications, guaranteeing a loss of 30 blocks. To have any chance of not losing any of your 40m blocks, your cluster would need to have 623 nodes, each storing only ~64k blocks (~32GB). For the same data set but with larger blocks (say 64MB, the hdfs default), you would have a total of 3.125k blocks. With 20 nodes you would lose 274 blocks, and you’d need 125 nodes to have any chance of not losing data, each storing only ~2.5k blocks (~162GB).
As n goes up, so does the probability of at least r nodes failing simultaneously.
By analogy, if you flip 2 coins and fewer than 2 land on heads, you’re not very surprised, but if you flip 50 coins and fewer than 2 land on heads, you are. If you have more nodes, there are more nodes that can fail, and so more will (read up on the binomial distribution if you need to convince yourself).
The ‘paradox’ is that if you’re considering the impact of adding an extra node on your chance of losing any given block, that probability goes down – you’re getting an increasing overall likelihood of data loss from a group of blocks that are individually growing less likely to be lost.
Finding the actual probability of a simultaneous r-node failure in a real-world system is complicated by the fact that ‘simultaneous failure’ actually means ‘within the window of time when the data from the first failed node is being redistributed to good nodes’, but it doesn’t fundamentally change the conclusion. The main takeaways from this are that you can’t stop worrying about single-node reliability just because you’re distributed, and that the actual level of safety you get from a given replication factor is sensitive to the size of the cluster.
I’ve been hunting for a good little history of the world, a quick read that I can pick up every few years to get a reminder of the sequence of it all. To that end, E.H. Gombrich’s A Little History of the World has the title exactly right. It turns out that it’s aimed at children, but given that the list of books I want my future kids to read is heavily biased towards fiction right now (The Phantom Tollbooth, the Wrinkle in Time series, the Chronicles of Narnia, etc.), I thought I’d give it a chance to see if it could make the cut.
Unfortunately, though, the book falls short of what I want in many ways. Despite its title, it isn’t really “of the world”, it’s “of Europe from an Austrian’s perspective”. While Gombrich dedicates a few chapters to India and China in early history, the East is shelved from the birth of Christ until European colonialism brings it back into the picture. North America makes nothing more than a small cameo appearance, and South America and Africa outside of Egypt are both ignored almost entirely.
Worse, Gombrich focuses rather narrowly on the sequence of religions, rulers, wars, and empires that shaped the Continent. Science is wholly represented by nameless cavemen, Leonardo da Vinci, an account of Galileo in which math is equated with magic, one mention of Niels Bohr in the afterword, and vague references to technology and progress. None of Pythagoras, Archimedes, Euclid, Copernicus, Kepler, Newton, Descartes, Euler, Darwin, Maxwell, Einstein, or others even merit a namecheck, while the story of the Reformation goes deep enough to include such household names as Jan Hus and Huldrych Zwingli. I can’t deny the impact of war and religion, but any book that uses more ink to romanticize those who worked so hard at tearing the world down than to even just acknowledge those who have built it up is not one that I want to use as the foundation of my kids’ understanding of history.
That said, the book does have a lot of good things going for it. While you never forget that it’s aimed at children, it’s well written and not afraid to touch on challenging ideas. There are genuinely nice overviews of the founding stories of some of the major world religions, thanks at least in part to Gombrich’s revisions with the help of his son, a preeminent scholar of early Buddhism. And I don’t mean to imply that a strong European focus is inherently bad – as long as you realize that’s what you’re getting, it’s perfectly fine, and if you don’t know your Visigoths from your Ostrogoths, your Guelphs from your Ghibellines, or your Babenbergs from your Habsburgs, this book will do a nice job of catching you up.
In Daniel Kahneman’s Thinking, Fast and Slow, the mind is divided into two systems. System 2 is the deliberate serial thinking process which we commonly regard as our self, and System 1 is the automatic parallel reacting process which makes up much more of us than we generally realize. System 1 is responsible for the first assessment of any information that comes in, most of which will be filtered out or handled without System 2 ever getting involved. When System 2 does come into the picture, it’s slow and lazy, frequently only giving a cursory assessment to System 1′s preference before confirming it. Most of the time this is fine – in fact, a key trait of expertise is that System 1 becomes adept at making these first decisions – but System 1 also has characteristic biases that this book explores in fascinating detail.
A principal bias that Kahneman describes is ‘WYSIATI’: what you see is all there is. WYSIATI is the “remarkable asymmetry between the ways our mind treats information that is currently available and information that we do not have”. System 1 is “a machine for jumping to conclusions”, forming opinions from information that’s at hand without stopping to consider what other information might be needed to actually justify those opinions. Some of the effects of WYSIATI include being overconfident in cases where we may be missing critical information, regarding the same information differently depending on how it is presented, and neglecting base rates when examples are readily available.
This idea appears repeatedly throughout the book, and where it isn’t explicitly summoned it’s often easy to see lurking in the shadows. The ‘anchoring effect’ describes how we latch onto numbers that are in front of us when trying to estimate quantities, even if we know that those numbers are completely random. The ‘availability heuristic’ answers questions about an event’s frequency or significance by simply gauging how easily instances come to mind (an example of ‘substitution’, our tendency to avoid hard questions by answering easier ones in their place). And ‘hindsight bias’ leads us to “assess the quality of a decision not by whether the process was sound but by whether its outcome was good or bad”. When our response ought to be “I don’t know”, System 1 comes up with an answer anyways, and System 2 often just simply goes with it.
Kahneman won the Nobel Prize for his development (with Amos Tversky) of prospect theory, and he dedicates several chapters to explaining the subject and its implications. Prospect theory is a critique of expected utility theory, which was developed to explain why rational decision makers might select a sure thing over a bet that has a larger expected value. For instance, we might prefer $100 over a 30% chance at $1000, despite the latter having a higher expected value of $300. If we choose the natural logarithm as the basis of our expected utility function (other functions can work, too), the expected utility of $100 is ln(100)*1 = 4.61, while the expected utility of a 30% chance at $1000 is ln(1000)*0.3 = 2.07. If we’re attempting to maximize utility, the choice of the sure thing over the gamble is clearly justified in this scenario.
Prospect theory observes that this model of decision making is incomplete in that it neglects the reference point of the decision maker. From different reference points, the same payout looks like either a loss or a gain, and we’re more averse to loss than we are to risk: when faced with a potential loss, we frequently become risk-seeking. Going back to our example of the choice between $100 and a 30% chance at $1000, prospect theory observes that a person starting with nothing may actually see the bet as a potential loss of the $100 they were otherwise guaranteed. In contrast, a person who already has the $1000 in hand perceives the certain option as a loss of $900, and is more likely to take the gamble despite its lower utility.
The implications of loss aversion are significant. The ‘endowment effect’ describes our decreased willingness to trade something we have for something we don’t, even if when we lacked both items we would have considered the second greater in value than the first. We work harder to avoid failure than we do to exceed expectations. Credit card companies seek legislation that requires differential pricing on cash and credit transactions to be called “cash discounts” rather than “credit surcharges”, because we’re more willing to accept forgone discounts than pay a surcharge – we make inconsistent decisions when equivalent scenarios are framed as a loss versus as a gain.
The last section of the book focuses on the distinction between the ‘experiencing’ and the ‘remembering’ selves. In one experiment, subjects who had to go through a painful medical procedure were asked at regular intervals during the procedure how much pain they felt at that moment. After the procedure was complete, they were asked to rate the total amount of pain they had experienced. The value that people assigned to this final question did not correspond to the sum of values they had assigned to the questions during the procedure; instead, they followed what Kahneman calls the ‘peak-end rule’ – their rating corresponded to an average of the peak level of pain during the procedure and the level of pain right at the end. In addition, the length of the procedure had no impact on the rating of total pain, a phenomenon called ‘duration neglect’. Our remembering selves punish our experiencing selves by causing us to make choices that lead us to suffer more in the moment than we’ll remember afterwards.
Very little is said about the actual mechanisms for all of this. Kahneman stresses that System 1 and System 2 do not correspond neatly to specific parts of the brain or body, but are instead rhetorical shorthand for complex interactions that manifest themselves as behaviors that can be observed via experiment. This decision to focus on observable behaviors is much appreciated, as it sidesteps the need for the Anatomical Latin laundry lists that are annoyingly common in psychology books aimed at the general public. Avoiding impenetrable anatomy recitations isn’t the only way that this book is better than its shelfmates. Where other books try to punch things up with jokes or hackneyed narrative, Kahneman keeps us engaged with concise and insightful discussion. Compare this to Dan Ariely’s Predictably Irrational, which covers much of the same ground but includes so much ‘entertaining’ filler that it feels thin and condescending. Kahneman trusts his audience enough to avoid doing this, and the result is a rich and rewarding read that I highly recommend.
So from here on out I’m going to be writing a little something about every book I read, in an effort to make more stick. These won’t be formal, just a brief synopsis and a collection of takeaways and other random thoughts. The only rule is that the post has to go up before the next book is finished.
First up: An Engine, Not a Camera: How Financial Models Shape Markets by Donald MacKenzie, which delves into the history of financial models and examines the impact of their adoption in trading. The central thesis is that some financial models have, through their use by traders, made real economic processes converge more closely to or diverge further from the theoretical economics those models assume.
One of the book’s main illustrations of this idea is the empirical history of option pricing. MacKenzie divides this history into three periods: pre-1976, when prices did not adhere well to Black-Scholes values; from 1976 to 1987, when they did; and post-1987, which has seen a persistent volatility skew that pulls prices out of line with theory. The mechanism MacKenzie cites for both shifts was the application of the theory by traders.
Following its publication in 1973, Black-Scholes allowed analysts to identify options that were mispriced according to the model. As traders rushed in to take advantage of those opportunities, their activity whittled away discrepancies until real prices more or less matched theory, beginning around 1976. One might be tempted to believe that this simply implies that the theory was correct, and the shift to match theoretical prices was the result of the market waking up to reality and fixing itself.
In 1987, however, things changed again. Option pricing theory was crucial to the development of portfolio insurance, which allowed investors to limit their exposure to losses from declining stock portfolios by selling corresponding index futures as prices went down. Two key assumptions underlying portfolio insurance (inherited from Black-Scholes) were that there would be no large gaps between successive prices (allowing time to make adjustments to the portfolio), and that trades (made to realize those adjustments) did not affect broader market prices. While it was accepted that the first assumption could be violated as a result of the market reacting to a dramatic world event, it was less appreciated that it could also come under pressure if the second assumption were violated, which grew more likely as the popularity of portfolio insurance increased:
By June of 1987, the portfolios “insured” by LOR and its licensees were sufficiently large that Leland was pointing out that “if the market goes down 3 percent, which, in those days, would have been a very large one-day move, we could double the volume [of trading] in the New York Stock Exchange”
On October 19th, it fell 22.6 percent. MacKenzie shares from an interview with Nassim Taleb:
… the crowd detected a pattern of a guy who had to sell [as] the market went lower. So what do you do? You push lower … and you see him getting even more nervous. … It’s chemistry between participants. And here’s what happened. You understand, these guys are looking at each other for ten years…. They go to each other’s houses and they’re each other’s best friends and everything. Now one of them is a broker. He has an order to sell. They can read on his face if he’s nervous or not. They can read it. They’re animals. They detect things. So this is how it happened in the stock-market crash. They kept selling. They see the guys sell more….
This is a great example of one of my favorite aspects of the book: its use of a large number of personal interviews to bring focus to the human aspect of all this, the individuals and groups behind the development and application of these theories. The pages are filled with little behind-the-scenes anecdotes: the pay-to-play deal behind Milton Friedman’s 1971 paper “The Need for Futures Markets in Currencies”, which provided a ‘stamp of authenticity’ ahead of the opening of the Chicago Mercantile Exchange’s International Money Market; the personal cajoling required to get traders to actually participate in the early days of the new exchanges; the exploitation of temporary put-call parity discrepancies on the Amex by literally standing between the two different specialists responsible for puts and calls; the phone calls throughout the night and early morning that, with 3 minutes to spare, allowed the Merc to re-open on October 20th, 1987. The list goes on.
A particularly interesting section is the account of how index futures were brought to market. Futures trading was legally distinguished from gambling by the fact that a future implies the possibility of delivery of the underlying commodity. If a future could only be settled in cash (as is the case for an index future), it was legally a gamble, and therefore illegal according to state-level gambling laws. To get around this, Chicago Mercantile Exchange chairman Leo Melamed and other futures exchanges lobbied for the creation of a new federal agency, the Commodity Futures Trading Commission, whose 1974 charter allowed it to preempt state laws. The legislation creating the CFTC was carefully drafted to expand the definition of ‘commodity’ to cover securities without actually using the word ‘securities’, which would have drawn the attention of the SEC (who would have tried to block the legislation to protect their authority). Once established, though, the CFTC was in a position to negotiate with the SEC, and, in 1982, reached an agreement with them that explicitly allowed index futures to be traded under the CFTC’s jurisdiction, thus circumventing those state gambling restrictions. The key to the SEC acquiescing? The fact that what was being traded was “a figment of Melamed’s imagination” – that is, not a security or a derivative, but a future that could only be settled in cash, the very trait that had made them illegal in the first place. If you don’t like the regulatory climate, just convince Leo Melamed to get you a new one.
The one disappointing aspect of the book is something MacKenzie couldn’t have possibly avoided: it was published in 2006, predating the 2008 financial meltdown. The other two arguably most significant market events of the last 30 years, the 1987 crash and the collapse of Long Term Capital Management, are covered in great detail, but the fact that such a large event related directly to the subject matter of the book is missing makes it feel incomplete. Luckily enough, MacKenzie has almost a dozen papers written since, available on his website.
One last minor note: in setting up discussion of Mandelbrot’s work to use the Lévy distribution in place of the normal distribution in modeling the random walks of stock prices, MacKenzie discusses how mathemeticians “redefin[ed] ‘polyhedron’ so that an anomalous polyhedron isn’t a polyhedron at all” to work around apparent counterexamples to a prized theorem. This struck me because I recognized that theorem as Euler’s Polyhedron Formula, V – E + F = 2, which is the subject of a book I read last year called Euler’s Gem by David Richeson. Chapter 15 of that book actually goes into detail about the history of the issue of what makes a polyhedron. In Richeson’s telling, the refinement of the definition of ‘polyhedron’ was not an effort to sweep something unpleasant under the rug; to the contrary, it was one of the key efforts underlying the development of topology.
Matt Gattis tweeted a quiz earlier tonight: 10 girls and 10 guys in a group. Sally dated 5 of the guys, Bob dated 2 of the girls. What’s the probability that Bob dated Sally? Think about it for a bit, then read on.
Kui Tang has a nice write up of the solution over on his blog, but I thought I’d bang out a quick alternate explanation for those of us who like to visualize our probabilities: imagine a 10×10 grid of cells, the x axis corresponding to the men and the y axis to the women, with each cell either on or off depending on whether the x,y pair had been on a date. Take and count up all unique grid configurations that have Sally going on 5 dates and Bob going on 2. That’s your denominator. Your numerator is then the number of these unique grids that have Sally matched with Bob. These are huge numbers, but then recognize that all possible non-Bob/non-Sally cell state configurations repeat for every unique Bob/Sally configuration, and so neatly cancel out.
The math given in Kui’s post is the same thing expressed with counting formulas, but I think picturing the problem as a set of unique grid layouts helps give a better intuitive understanding of what’s going on. It’s hard to accidentally overcount, for instance, because its clear that the visual equivalent to (10 2) * (10 5) counts the Bob+Sally cell too many times, and it erases the questions about the cases where only one other woman has dated at all or 2 other women have dated 10 guys, because it’s clear they’ve been taken into account as part of the massive number of states that cancel out when you do the tally.
Color is a new photo sharing app that builds social networks based on proximity. You take a picture with the app, and it turns around and starts grouping you with and sharing photos from other people nearby who have done the same. Sounds kind of dumb, right? Why would I want to see photos from nearby strangers?
Well, Sequoia thinks there’s something there, and has put $41 million into the company before it’s really even launched (thanks to a killer pitch deck). “Not since Google” have they seen this. Given that “this” currently refers to an app that I can’t even get to work on my phone, I’m left hoping that there’s a lot more going on here.
So what could that be? I’m going to put on my magic hat of credulity now, and describe what I (yes, I, random internet wantrepreneur) would be willing to bet $41 million on in this space.
Color is being run by Bill Nguyen, who sold Onebox for $850M in 2000, Lala to Apple for over $80M in 2009, and (at least until 11:41am today) spent time at AdGent. I’m not going to say that his presence means Color will be successful, but I do take it as a pretty good sign that there’s no possible way their actual business story is “Color shows you photos taken by people in the same room and then money pours out”.
From the TechCrunch writeup:
Color is also making use of every phone sensor it can access. The application was demoed to me in the basement of Color’s office — where there was no cell signal or GPS reception. But the app still managed to work normally, automatically placing the people who were sitting around me in the same group. It does this using a variety of tricks: it uses the camera to check for lighting conditions, and even uses the phone’s microphone to ‘listen’ to the ambient surroundings. If two phones are capturing similar audio, then they’re probably close to each other.
Remember The Dark Knight, when Batman hacked into everyone’s cellphone and streamed back sonar data to build a cohesive picture of what was happening everywhere in the city? That sounds awfully similar to what’s going on here – photo, GPS, and audio streams feeding back to Color in such a way that they can build a real-time model of where all their users are, who they’re with, and what’s happening around them.
With that kind of technology, who cares what their frontend does? Based on the quality of the first release of the phone apps, they’re clearly not sweating it too much. Whatever hook they try to snag users with is just a way to get that datastream, so they should ride whatever wave is currently popular. This week it’s Instagram and Path, so, sure, do that. Next week it’s going to be something else, so next week they’ll shift their apps towards that, or if they really can’t figure out how to get traction, they’ll release an api and let others do it for them. It doesn’t matter how that data comes in, as long as it comes in.
The web is training advertisers how to most effectively work with real time data (tracking cookies, ad auctions, sentiment analysis, twitter monitoring, all of that). How many companies work on this? How much money is being spent on these efforts, and how much is being made? There’s already one $190B company in this space on the web; the startup that can bring the same sort of tools into the real world might actually have a shot at becoming another.
Facebook, Foursquare, Yelp, Gowalla, Brightkite, Loopt, and everyone else with check-in functionality are already going for this. The biggest differences with Color seem to be that they want check-ins to be implicit byproducts of actions users have other motivations for (you’re not trying to get a free soda, you’re taking a picture to, uh, show to strangers in the same room), and that they’re handling far more inputs than just location.
These differences are both potentially huge. Other services risk crossing a mental line where explicitly checking in feels like work done for compensation (which is bad, which is why Foursquare is set up as a game), whereas this is an attempt to keep motivation purely social. And using multimedia opens up the door for all kinds of data points – facial recognition to keep track of people who aren’t actively using their product, brand recognition to note logos on clothes or labels on bottles, song recognition to track what music people are actually listening to – that advertisers would pay through the nose for.
Taking the credulity hat back off, even though I really do think the potential business models could make a ton of money, I’m equally convinced that this initial attempt at getting users isn’t going to make it very far. With $41M in the bank, though, they’ve got plenty of room to fail.
Update: Bill Nguyen confirms all of this almost point-by-point in an interview with Business Insider:
Photo sharing is not our mission. We think it’s cool and we think it’s fun, but we’re a data mining company.