Sunday, December 10, 2017

Statistics Sunday: Bayesian Inference in a Galaxy Far Far Away

I was recently rewatching Rogue One with a friend the other day. Since this is part of the Star Wars universe, it of course had to have some of the usual Star Wars elements: strange-looking aliens, someone uttering the line "I've got a bad feeling about this," and droids rambling off odds of different outcomes. Always bad outcomes - seriously, why don't the droids ever feel the need to say, "The odds are 50 to 1 that everything is going to turn out okay," or "There are puppies ahead; 200 to 1 odds of many puppy snuggles"?

But I digress. Because what I really want to talk about are those odds, and why they tell us something about the droids. True, they're sprinkled into the movies mainly as jokes. We don't really need to pay attention to the odds, other than to be impressed when the bad thing the droid was calculating on about doesn't end up happening. For instance, from The Empire Strikes Back:

Or this one, from Rogue One:

The information from the droid isn't actually that important. The point is that the line should make you laugh. But I was thinking about how this information is used in the Star Wars universe, and more importantly, where it could be derived from. And I came to an important realization:

These droids must be using Bayesian inference.

It's incredibly unlikely that these probabilities are empirically derived (BTW, this approach of using completely empirical data to derive probabilities is called Frequentism). C-3PO, for instance, says the odds of successfully navigating an asteroid field are 3,720 to 1. What that means is he has to have data on at least 3,721 attempts at navigating an asteroid field. And of course, you'd want more data than that. Just because 1 attempt out of the 3,721 was successful doesn't mean those are the true odds. It's possible the odds are actually 10,000 to 1. You need a lot of data to empirically derive the probability of something.

And what about K-2SO simply saying the probability that Jyn will use the weapon against Cassian is "very high"? It doesn't actually matter what the probability is, but where does that value come from? Sure, it's possible that K-2SO is simply using the probability that an escaped convict would use a weapon on another person, but still, it doesn't seem like there would be a lot of data just laying around. And if K-2SO prefers to use data specific to the situation, he'd need data on the outcome of a very specific situation, one that has likely never happened.

But it isn't unusual for people/droids/whatever to want to know the odds of something that might never have happened before - an event so rare it's impossible to observe it naturally but that you need to be prepared for in the unlikely event that it happens. Insurance companies need to know the potential risks of taking on a new account. Governments need to prepare for potential wars. And scientists need to be able to make causal inferences from their data, sometimes data not collected in such a way to infer cause. To a classical statistician, those puzzles would be difficult, maybe impossible. But to a Bayesian, it is completely possible to generate odds on a thing that has never happened before.

(If you need to refresh your memory on Bayes' Theorem, check out posts here, herehere, and here. And as soon as I learn how to invent more free time, I'm going to sit down and learn Bayesian statistics so I can stop Dunning-Kruger-ing my way through it.)

What K-2SO and C-3PO are generating are conditional probabilities - the probability of something happening given known probabilities about the present situation. These known probabilities are called "priors," and the droid could draw on whatever priors make sense. So C-3PO might be drawing on data about the maneuverability of the Millennium Falcon, the probability of crashes while being pursued, size and motion of the asteroids, and even observations about Hahn Solo himself. Using those conditions, C-3PO can calculate the probability that they'll make it out of the asteroid field alive.

(Side note: Successfully navigating an asteroid field actually wouldn't be that difficult. Check out this post from The Math Dude at Quick and Dirty Tips.)

And just as with the asteroids, K-2SO doesn't need to have the empirical odds that Jyn will use her "found" blaster on Cassian. Instead, he could use known information on Jyn's proclivity toward violence, rates at which convicted criminals use guns, and even probability of a weapon being fired in emotional situations or probability that Cassian will piss Jyn off somehow. K-2SO could use whatever priors make sense, and use that information to derive this "very high" probability.

Hopefully you're as excited as I am about seeing The Last Jedi!

May the Force be with you.

No comments:

Post a Comment