Archive for December, 2007

Voice Risk Analysis Software criticised in national media

December 29, 2007

Regular readers will remember my writing an article about voice risk analysis software employed by Harrow Council in particular to detect benefits claimants who may be lying. At the time I pointed out that there was no evidence, other than anecdotal, that the product worked as claimed. And that:
“Harrow Council, and soon the Department of Work and Pensions, could well be detecting and prosecuting fraudsters using software whose only claim to be efficacious is that the company that makes it says it is.”

Today I find that this is not just my opinion, but also that of Paul Lewis, presenter of Radio 4’s Moneybox programme. He popped up on Breakfast on BBC1 this morning to put the arguments against the credibility of these devices much more succinctly than I have done.

In brief, Voice Risk Analysis (VRA) software claims to detect stress in a person’s voice, not lying specifically. In day to day use, the software flags up calls from benefit claimants whose voices betray signs of stress and the call can then be followed up with further checks.

The presenters told Lewis that Harrow Council had saved £110,000 by the use of Voice Stress Analysis: Lewis pointed out that the problem was that they had nothing to compare the method to, how did Harrow Council know for instance, that picking people at random and following up their claims would not generate an equivalent saving? Also, simply knowing that Harrow Council was piloting such a scheme might be enough to put off fraudulent claimants (as well as discouraging genuine claimants wary of harassment).

Superbly put. Excellent piece. Also in Lewis’s favour was his mention of Niels Bohr in the context of making predictions about the future. Anyone who can mention a famous physicist at 8:45 in the morning on national television gets my vote.

A piece on VRA also appeared later in the day on the 29th December issue of Moneybox. Lewis was presenting and went into a lot more detail.

It turns out that the supplier of the software is Digilog as predicted on this blog back in September. Digilog, it seems, will not tell anyone how it is able to detect stress in voices, nor why they assume that stress and lying are linked. This should start alarm bells ringing in any reasonable person’s head. Contrary to what your intuition might tell you, systems whose workings are secret are far more vulnerable to exploitation than those which are not; this is because the more people who know how the system works, the more chance there is of them spotting errors, mistakes or other problems in the software. It’s obviously very important that systems designed to catch people lying work properly: no one wants criminals getting away with it on the one hand, or innocent people being harassed on the other.

Unbelievably, James Plaskitt, the Parliamentary Under-Secretary with responsibility for Housing and Council tax benefit in the Department of Work and Pensions chose to beg the question when interviewed for Moneybox. What follows is a paraphrase of the exchange with the interviewer:

Interviewer: What scientific research is there to show that this technology works?
Minister: That’s why we’re running the pilot schemes, to see if it works.
Interviewer: But that isn’t scientific research, is it?
Minister: No, but it’s not up to us to review the science.
Interviewer: Surely it’s important before you implement this system that you know it works. How will you know if you don’t look at the science?
Minister: Well that’s why we’re running the pilot schemes – to test this thing out. What’s more, the operators I’ve talked to who use this system are convinced it’s solid!

With the minister responsible for national rollout talking in circles and presenting anecdote as evidence, it’s no wonder that a device that has been shown to be no more accurate than flipping a coin can gain such a grip on the minds of politicians and civil servants, all of whom are seeing it as a magic wand to cut fraud.

How can intelligent people not spot that there’s no evidence for something, or think that a pilot scheme will establish its efficacy without anything to compare it with? I hope the answer isn’t “because they don’t understand how science works”.

Remember the recent biometrics scandal, where a person wearing a gelatin overlay over their fingers was able to fool a commercial fingerprint detector 80% of the time? The UK government is committed to launching a national ID card scheme based on just such biometric equipment starting in 2009, with about as much evidence that it will work as Digilog have for their VRA software.

The fact is we don’t know that VRA technology works, and have no right to believe that it will work based on the evidence seen so far. The government should not be implementing it.

A Secret Santa Game

December 27, 2007


(Note: originally, this article was written using Mathematica, a computer algebra package. Unfortunately, the conversion from Mathematica’s native display format to HTML did not work properly, so the text had to be transferred by copy and paste, and the calculation inputs and outputs had to be done “by hand” where possible, and by conversion to bitmap where not. Apologies for the resulting confused appearance.)

A group of friends take part in a game of Secret Santa at Christmas. There are n people taking part.
What is the probability that m pairs of people will have each other as their Secret Santa?

The way to tackle this question is to find the probability that m particular pairs of people will have each other as their secret Santa, then multiply that probability by the number of possible groups of m pairs. A pair of people that have each other as their Secret Santa is called a reciprocating pair.

First we need to find the probability of finding one matching pair of people in a group of n friends. Let’s start by imagining that there are 5 friends taking part: Dave, John, James, Jane and Erica. What’s the probability of finding a reciprocating pair?

Well, we need one person to pick someone, who goes on to pick him or her in return:
This is 1/4 x 1/4 = 1/16.

This works because there is a 1 in 4 chance of say Dave picking John (he can’t pick himself), and a corresponding 1 in 4 chance of John picking Dave.

In general, for n people, the probability is 1 in (one less than the number of people taking part), multiplied by itself.

This is 1/(n-1)^2

The probability that a further pair will have each other must now be calculated from the remaining people. In our example, we are down to James, Jane and Erica, which means that the number of reciprocating pairs is 3 x 2 = 6. In general though, we now have two fewer people than we did before to choose from. That’s n – 2 people. Using the rule that a particular person can’t pick him or herself, we can calculate the probability of a second match as

1/((n-2) – 1)^2 or 1/(n – 3)^2

The probability of a third match is calculated using the fact that we now have two fewer people to choose from again, or n – 4 people. This gives us 1/((n-4) – 1)^2 or 1/(n – 5)^2.

The probability of an mth match can be found by glancing at the numbers after n in the denominator of each fraction. You can see that they form a sequence 1,3,5,… The position-to-term rule for a sequence of this type is 2m – 1, where m is the number of the match. This is 1/(n – (2m – 1))^2.

To find the probability of a particular three pairs matching in Secret Santa, you just multiply the probabilities of a first, a second and a third match together.

For a game with 11 people (i.e. n = 11) this works out to 1/230400.

This low figure means that for an eleven person game, the probability of a particular three pairs matching, say Amy – Bill, Charles – Derren, and Emily – Fred is vanishingly small.

What about the probability of a particular m pairs matching? To calculate this, we need the probability of the mth pair matching, which we already know is 1/(n – (2m – 1)^2, and then multiply it by the (m -1)th probability, and the (m -2)th probability, and so on until we come to m = 1.

To do this, we use the product function, which multiplies its terms together in a way analogous to the sum function ∑. Here, the range variable is i.

∏ 1/(n – (2i -1))^2 (evaluated from i = 1 to m)

We are half-way to a general formula. We now need to know the number of possible groups of m particular reciprocating pairs chosen from n people. This is easier than it seems. The way to count the number of pairs in n people is n(n-1). Look at the possible pairs for our original 5 people: Dave, John, James, Jane, and Erica.

Dave – John
Dave-James
Dave-Jane
Dave-Erica
John-Dave
John-James
John-Jane
John-Erica
James-Dave
James-John
James-Jane
James-Erica
Jane- Dave
Jane- John
Jane-James
Jane-Erica
Erica-Dave
Erica-John
Erica-James
Erica-Jane

Looking in the left column, you can see that each of the five names appears, and is matched with everyone else except themselves. This means each person is matched with four others. That is why the formula for possible pairs is the number of names multiplied by one less than this number: n x (n – 1). In our particular case of 5 names, there are 5 x 4 = 20 possible pairs. Of course, to get a reciprocating pair Jane-John and John-Jane must be combined – Jane buys for John and John buys for Jane. This means the number of reciprocating pairs is half the number of possible pairs. The formula now becomes (1/2)n(n – 1).

Having matched up two people, they are removed from the group, which now has n – 2 members. We then count the possible pairs again. The formula for the next set of pairs is therefore (1/2)(n – 2)(n – 3).

We have n – 4 players left, so the formula is (1/2)(n – 4)(n – 5) and so on. To find the total number of groups consisting of three reciprocating pairs out of n people, we simply multiply our three formulae together:
(1/2)n x (n – 1) x (1/2)x (n – 2) x (n – 3) x (1/2) x (n – 4)(n – 5). For 11 people this is
(1/2) x 11 x 10 x (1/2) x 9 x 8 x (1/2) x 7 x 6 = 41580.

For n people, we use another product formula:

∏((1/2) x (n – 2i) x (n – (2i+1)) (evaluated from i = 0 to m – 1)

Using Mathematica, we can now combine the formula for m particular matches with the formula for m possible groups out of n people:

This gives us

The Pochhammer function takes inputs n and m and evaluates according to the rule

It is related to the factorial function.

Finally, we define a function whimsically called SecretSanta, which takes inputs n and m and gives us the probability of finding m reciprocating pairs in n people.

For n = 11 and m = 3 the output is given below.

SecretSanta[11, 3]

0.180469

Or an 18% chance that there will be three matching reciprocating pairs.

This 3D visualisation shows how the probability of any number of matches falls off quite dramatically with rising numbers of people taking part.