Reverse-Engineering Our New Computer Overlords: Watson, Jeopardy, and Sports Decision Making

What are you writing, Dave?

By David Roher

Watson closed the pod bay door on Ken Jennings and Brad Rutter last night, bringing the IBM robot’s 2-game Jeopardy total to over $77,000. Humans are still the kings of awful robot-pop-culture references (QED), but our monopoly on minutia is busted. Watson’s dominance is the product of brilliant artificial intelligence and vast computing power. Any imitations will therefore come up short, though that doesn’t mean we shouldn’t try: think about the number of NBA players that grew up emulating Michael Jordan.

With that in mind, I printed out Jimmy Carter’s Wikipedia article, scattered the pages on the floor, and turned on my Roomba. Then I tried to get it to dunk while sticking its tongue out. Both ventures failed (though it developed healthy affinities for peanuts and Space Jam), so I decided to aim lower. Only idiots and geniuses could envision developing their own versions of Watson’s language-reading or answer-giving processes. But one aspect of IBM’s algorithm is worth reconstructing: the “buzz threshold.” The computer not only determines the chance that its guess will be correct, but also the smallest chance below which Watson won’t buzz in. For instance, if the robot thinks there’s a 40% chance that “Space Jam” is the correct answer and the buzz threshold is 30%, Watson will buzz in, even though he thinks he’ll probably be wrong. However, if there’s an 80% probability that “Space Jam” is correct with a 90% buzz threshold, he’ll silently and smugly lament the missed opportunity to correctly mention his favorite film. (Watson and my Roomba lived on the same street back in elementary school.)

The buzz threshold is also arguably the most sports-related aspect of Watson’s design. “Should I buzz in?” is essentially the same question as “should I shoot?” or “should I swing?” A batter’s “swing threshold” depends on the pitcher, the count, the number of outs, the score, the runners on base, the inning, and what time he has to be home to catch the network television premiere of his favorite movie, Shakespeare in Love (Space Jam is his second-favorite). Here’s a list of what likely influences Watson’s buzz threshold:

The amount of money he and his two opponents have: if Watson is losing by a large amount, his threshold will be lower, since he needs to take more risks to get back in the game. If he has a comfortable lead, the threshold will probably go up, as he doesn’t need to provide an avenue for his enemies to get back in the game.
The number of questions/dollars/Daily Doubles left up for grabs: even a good-sized lead doesn’t mean much before Double Jeopardy, so Watson will still likely keep a relatively low threshold early on in order to widen his lead.
Monetary value of the question: getting a $200 clue wrong probably won’t kill you (especially if you’re a robot who can’t really “die” in our sense of the word), so IBM probably allowed for some more leeway here.
0, 1, or 2 previous incorrect buzzes: if the other players have already buzzed in, the return on a correct answer is likely lower, since Watson isn’t denying his opponents the chance to get more money. On the other hand, if Watson thinks that his opponents will get the question wrong and no one has buzzed in yet, he might want to wait.
Time left: did Alex Trebek just stop reading the question, or is time about to run out? The human “buzz threshold” likely decreases as time elapsed increases, but I’m not sure whether this is really rational.
What is the goal? To have more money at the end of the game than the other two players? To maximize the amount of money won? Is there value in finishing second instead of third? Fun Fact: I have it on good authority that IBM’s engineers initially forgot to program a purpose, which led to Watson’s first existential crisis. L’enfer, c’est l’infini récursivité.
Opponent strength: is Watson facing Ken Jennings and Brad Rutter or Burt Reynolds and Sean Connery? (Hey, you know what’d be a pretty great SNL sketch?) Watson will have a higher buzz threshold against players more likely to buzz correctly. There’s also the issue of categorical vs. universal strength: if Burt got every other question in “Potent Potables,” he’s probably going to know the last remaining one too, even if he’s performed poorly on the other ones. Meanwhile, if Jennings is stumped on the previous clues despite an otherwise solid game, he probably won’t get the remaining questions either.
Opponent buzz threshold: what if Watson were playing against two other Watsons? (Did I just blow your mind?) He’d have to consider the likelihood that they would buzz in as well. The nuances of robot cloning aside, he’ll have a good chance at guessing opponents’ buzz thresholds if he simply assumes that they’re rational.
Randomness: the above point is just as true for Watson’s rational opponents: if he follows a strict formula, then adversaries will have advantageous knowledge of his behavior. By introducing some degree of randomness to his threshold, or randomly choosing a different threshold algorithm to run, Watson can partially mask his strategy and confuse the entire internet.

Sound complex? It is, but you’re probably already pretty good at it yourself. I did quiz bowl in high school (Brad Rutter moderated some of our games at nationals), and while my confidence in the answer was the main component in whether or not I buzzed in, I did have a threshold. When I was playing a baseball-obsessed teammate during practice, I buzzed in very early on baseball questions even if I wasn’t sure yet. Of course, my teammate did the same thing, which usually led to one of our having to guess the player (or team or type of bat) based on a pronoun.

The ironic aspect of human thresholds is that thinking usually screws us up, especially in sports: we spend plenty of time criticizing coaching decisions on the blog, but the most important choices are made by the athletes. Unless Tim Wakefield or Jamie Moyer is pitching, the batter doesn’t have a chance to get out his calculator before deciding whether or not to swing. Hesitation and introspection usually don’t work out. The greatest athletes get a lot of their value from the ability to do what Watson does: take in a seemingly infinite amount of information and act upon it in a seemingly infinitesimal amount of time. The next time you watch a highlight of Gretzky, Messi, or Nash (John or Steve) on the attack, try to figure out their decision-making process. It’ll take you a few slow-motion instant replays, and you’ll probably still be wrong.

That doesn’t mean you should stop thinking (as you can tell, this warning appeared too late for me). Just as we might learn something sports-related from Watson’s thought process, we also can learn something robotically related from athletics. We’re still smarter than the machines, for now. Ask yourself: could a computer have made Space Jam?

After hashing out the theory, my next step will be to code a Jeopardy simulator that can solve for the ideal buzz threshold in all situations, and perhaps solve for sports-related thresholds as well. Provided I can get it to work, stay tuned.

5 Responses to Reverse-Engineering Our New Computer Overlords: Watson, Jeopardy, and Sports Decision Making

Bryan C says:

February 17, 2011 at 9:34 am

Went to a talk at BU Tuesday held by some people who worked on Watson; the impression I got was that your first point (how much Watson is leading/trailing by) was the biggest factor in determining buzz threshold.

(Well-written article by the way.)

dadler3 says:

February 17, 2011 at 6:03 pm

Great article from the Roh-bot. We may one day be able to have articles from an actual robot.

Bryan C, I think you are right on, check out this video with one of the engineers (skip to 3:55):

Pingback: Computers, Jeopardy and Sports Decision-Making | Brian McCormick Basketball
Tim says:

February 20, 2011 at 11:53 pm

We already have articles from a “robot”: http://statsheet.com/about

“StatSheet leverages advancements in data management, analytical processing, artificial intelligence, data visualization and automated publishing to generate, curate and deliver relevant and compelling real-time and historical content through a central portal and across a network of team specific sites.”

jeopardy games says:

April 11, 2013 at 9:56 am

A formidable share, I simply given this onto a colleague who was doing slightly analysis on this.
And he in reality purchased me breakfast because I found it for him.
. smile. So let me reword that: Thnx for the deal
with! However yeah Thnkx for spending the time to debate this, I feel strongly about it and love reading extra on this topic.
If potential, as you turn out to be expertise, would you thoughts updating your weblog with more particulars?
It’s highly helpful for me. Large thumb up for this blog post!