T-tests? That's for amateurs! In this post we're going to build a Voight-Kampff Test! If you haven't watched Blade Runner or read Do Androids Dream of Electric Sheep? you may be unfamiliar with this famous test. The purpose of the Voight-Kampff test is identify replicants. In the film replicants are androids that are nearly indistinguishable from humans. The only way to detect a replicant is by testing their reaction to a series of questions intended to invoke an emotional response. When humans are asked these question they uncontrollably respond with "capillary dilation or the so called 'blush response', fluctuation of the pupil, involuntary dilation of the the iris." Using the Voight-Kampff machine an interviewer can measure these responses.
Voight-Kampff in action
Why do we need Bayesian Analysis?
The machine pictured above is only useful for measuring response. It requires the skill of the interrogator to make the call. Each question gives the interrogator a bit of evidence, but this is not an easy task. The purpose of the test is to identify rogue replicants, and the consequence of being identified is execution. We want to be very certain of the test's results. Deckard, the film's protagonist, mentions that he has never mistakenly terminated a human. Deckard also reveals that for most cases 20-30 questions are required to come to a conclusion, but in special cases it can take more than a 100! This means we gather evidence until we have enough to establish a firm belief.
You aren't going to use a p-value for that are you?
Introducing Bayes Factor
A regular t-test isn't going to solve the problem we have. For starters we don't want to simply "reject a null hypothesis". With a t-test we can only say something like "There is a 19/20 chance that the subject is not your average human". That's a pretty lame belief if it means your next step is going to be putting a bullet in someone's head. What we want to be able to say is "The subject is a replicant beyond any doubt". Then there's the issue that p-values are rather hard to reason about as they grow close to 1 or 0. Saying there's a 0.999 chance someone is a replicant and a 0.9999 chance both seem roughly the same in our heads, but there is an order of magnitude difference! Additionally 0.9999 is still too much error if the result is going to be "elimination". We also want to ask a series of questions and stop when we have enough evidence to decide, this would violate proper experiment design for traditional null hypothesis testing.
To solve this problem we're going to use a method of gathering evidence that was discussed by ET Jaynes in his amazing Probability: The Logic of Science, commonly referred to as "Bayes Factor". Bayes Factor will convert our probabilities into evidence. The first step before we can calculate our evidence is we need to calculate the odds for our hypothesis. Stating the odds of an event is a very common way of expressing probability. When someone says "There's a 50/50 chance" they are expressing probability in terms of odds. Formally, given that we have a prior information (X), data (D) and our a binary hypothesis (H) (ie replicant or not replicant) the odds, (O) can be expressed as the ratio of two probabilities:$$O = \frac{P(H|DX)}{P(\bar{H}|DX)}$$
It is important to realize that in this context we're referring to (D) and (X) as simply the general concepts of "data" and "prior", clearly the data and prior belief for the hypothesis and its complement are not going to be the same. To make this clearer, we'll calculate the odds for a subject being a replicant given some data.
To start we want to find evidence that our subject is a replicant, this is the (H) and then (\bar{H}) must be "is human". Suppose for the question we ask 90% of humans show an involuntary response and only 1% of replicants do, this, plus the answer to our question is our (D). We're also going to say in general there is a 10% chance our subject is a replicant because otherwise we wouldn't be interviewing them in the first place, now we have (X) our prior.
If we ask our question and the subject shows an involuntary response we get:
(P(H|DX) = 0.01\cdot 0.1 = 0.001) Since we're weighing the likelihood that a replicant would show a response by the prior belief they are a replicant (remember (H) here stands for 'hypothesis', and our hypothesis is that the subject is a replicant).
(P(\bar{H}|DX) 0.8\cdot 0.9 = 0.72) Likelihood of a human showing the response weighted by the prior that they are human.
(O = \frac{P(H|DX)}{P(\bar{H}|DX)}= \frac{0.001}{0.72} = 0.0013889) Our Odds are less than 1 so they are not in favor or our hypothesis. This makes sense: the subject is more likely to be a human than a replicant in the first place, and they answered the question the way most humans do and not the way most replicants do.
There's one simple step we need to transform our Odds into Evidence. We're going to simply take the (log_{10}) of the odds and then multiply it by 10. This ends up giving us a measurement very similar to decibels in sound. Even better, our intuitions about decibels in sound carry over: 1 means very little evidence in favor of our hypothesis, and 100 more means our evidence is extremely 'loud'. Our final formula for evidence is:$$e = 10\cdot log_{10}\big[ O\big] = 10\cdot log_{10} \big[\frac{P(H|DX)}{P(\bar{H}|DX)}\big]$$For our example we get:$$e = 10\cdot log_{10} \big[\frac{0.001}{0.72}\big] = -28.57$$
This intuitively means that we have about 28 points against our hypothesis. Right away we can see why evidence in the form a Bayes Factor is nice: we have a very natural understanding of evidence that scales really well even for extreme values.
If our data is independent there is another huge win for Bayes Factor. In this case, independent means that knowing the response to one question doesn't tell us anything about the response to another. It also means it doesn't make sense to use our response from question 1 to update our prior for question 2. If it is the case that our questions are independent, then because of the logarithmic transformation we can just sum up the evidence for each question to get our total belief! That means if Q1 gives us 10 decibels of evidence and Q2 gives us -4, then at that point we have 6 decibels of evidence. Now let's get some data, and put all our tools together to build our test!
Putting it all together
Below we have a table of questions that we're going to ask (simply labeled Q1, Q2, etc). In the column labeled "human" we have the portion of humans who have an involuntary response registered by our Voight-Kampff machine, in the column labeled "replicant" is the portion of replicants that have a response. The next two columns, labeled "response" and "no response" are the evidence we gain depending on the response of the subject (we'll round to the nearest whole number). Finally, we have our prior belief that the subject is a replicant, which we'll keep at 0.10.
Now that we have our data, let the test begin!
Inference
Before we ask any questions we need to establish our initial belief about whether or not the subject is a replicant. Our prior belief, as stated before, is there is a 1/10 probability the subject is a replicant. The initial evidence before any questions are asked is then:$$e = 10\cdot log_{10}\big[\frac{P(H|X)}{P(\bar{H}|X)} \big]=10\cdot log{10}\big[ \frac{1/10}{9/10}\big]= -10$$
Our starting evidence is -10, which means that we're starting from the initial belief that our subject is not a replicant. In our example, we just made the prior up, but in a more realistic case it would likely have been established from data as well.
We ask our first question, Q1, and we observe no response. In the table above we have already precalculated all of the evidence, but to make sure our reasoning is clear lets calculate where that is coming from (and remember we're rounding everything to whole numbers).
For (P(H|D,X)) given no response to Q1, we take the probability of a replicant not responding, which is just 1 minus the probability of a response, and weight it by our prior belief that the subject is a replicant.
$$P(H|D,X) = (1-0.01)\cdot 0.1 = 0.099$$
And (P(\bar{H}|D,X)) is the exact same reasoning only for a human:
$$P(\bar{H}|D,X) = (1-0.95) * 0.9 = 0.045$$
putting it all together we get:
$$e_{q1} = 10\cdot log_{10}\big[\frac{P(H|D,X)}{P(\bar{H}|D,X)} \big] = 10\cdot log_{10}\big[\frac{0.099}{0.045} \big] = 3$$
Because of the log transformation we can simply add our initial evidence (-10) to our Q1 evidence (3) to see what we currently believe, which is -7. Even though not showing a response is more likely for a replicant than a human it is not that so unlikely that it moves our belief that subject is a replicant all that much from our prior. At the same time -7 isn't a particularly strong belief that the subject is not a replicant. Let's ask the rest of our questions:
Our final belief is -119 which means we are essentially certain that the subject is human, even though they answered a few questions like a replicant. Let's look at another subject that is more suspect:
Our final belief is 30, are we ready to 'retire' the subject? No! We have a pretty strong belief the subject is a replicant, but as stated perviously, we ideally want a belief of 100 or more before we decide. We don't want our beliefs to be summarized as "meh, I'm pretty sure", but rather "you're damn right that's a replicant!"
We can now see why Deckard needs 20-30 questions for most cases. It also explains why the enhancements in the nexus 6 model replicant raise that number to around 100, just a few 'very human' responses set our beliefs way back.
Now the only question left is... have you ever taken the test?
If you enjoyed this post please subscribe to keep up to date and follow @willkurt!
Comments (0)
Sign in to post comments.