Barnard’s unconditional exact test on 2×2 matrix

VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)
VN:F [1.8.1_1037]
Rating: 8.0/10 (1 vote cast)

Barnard’s unconditional exact test on 2×2 matrix

There are two fundamentally different exact tests for comparing the equality of two binomial probabilities – Fisher’s exact test (Fisher, 1925), and Barnard’s exact test (Barnard, 1945). Fisher’s exact test (Fisher, 1925) is the more popular of the two. In fact, Fisher was bitterly critical of Barnard’s proposal for esoteric reasons that we will not go into here. For 2 × 2 tables, Barnard’s test is more powerful than Fisher’s, as Barnard noted in his 1945 paper, much to Fisher’s chagrin. Anyway, perhaps due to its computational difficulty the Barnard’s is not widely used.
Consider a general 2×2 matrix

Category 1 Category 2
Row sum
Condition 1
A B RS1
Condition 2
C D RS2
Column sum CS1 CS2 total = N

Suppose that the common probability of infection under the null hypothesis is πA = πC = π. Then the probability of observing any generic table X is a product of two binomials:

Barnard1

The exact p-value is then the sum of such probabilities for all tables, X, that could have been observed which are at least as extreme as the observed table, X0 , under the null hypothesis. Specifically, suppose T (X) is a “discrepancy measure” or test statistic that measures how discrepant any table X is relative to the type of table one would expect under the null hypothesis. Then, for any given π, the exact p-value of the observed table X0 is:

Barnard2Now there exist many different test statistics for determining if a table X is more or less extreme than the observed table X0 under the null hypothesis: we will use the popular Wald statistic:

Barnard3

Although the Wald test statistic T (X) is well defined, equation (2) is not of much practical use for computing an exact p-value because it depends on knowing the value of π, the common probability under the null hypothesis. One could of course use the data to estimate π and then substitute this estimate into equation (2). But then the p-value would no longer be exact. The main difference between the Fisher’s and Barnard’s tests is the manner in which they eliminate this nuisance parameter from (2) without sacrificing the exactness.

Fisher’s exact test

Fisher’s exact test gets rid of π by restricting attention only to generic 2 × 2 tables X in which A + B = Rs1, the sum of responses that were actually observed in the data. Accordingly define:

Barnard4Fisher noted that, under the null hypothesis, provided we confine our attention only to tables X ∈ Γ(Rs1), each table has a hypergeometric probability that no longer depends on the nuisance parameter π (see my post: Fisher’s conditional exact test on 2×2 matrix).

Barnard’s exact test

Barnard’s exact test is an unconditional test. It generates the exact distribution of T(X) by considering all the tables X ∈ Γ, and rather than just the tables X ∈ Γ(Rs1). What shall we do with the unknown nuisance parameter π in the above p-value (equation 1)? Barnard suggested that we calculate p(π) for all possible values of π ∈ (0, 1) and choose the value, π , say, that maximizes p(π). And this is computationally very very difficult….

The algorithm

Now, let we see the algorithm in deep.

Step 1: The Wald’s Statistic

The first step is to compute the Wald’s statistic for each possible value of A and B (from now on I and J).

I∈[0:1:Cs1] and J∈[0:1:Cs2]. One could write:

1
2
3
4
5
6
Cs=sum(x); N=sum(Cs) %Columns sum and total observed elements
for I=0:1:Cs(1)
    for J=0:1:Cs(2)
         TX(I+1,J+1)=(I/Cs(1)-J/Cs(2))/sqrt(((I+J)/N)*(1-(I+J)/N)*sum(1./Cs))
    end
end

Notice that for (I=0 & J=0) or (I=Cs(1) & J=Cs(2)) T(I,J)=0/0=NaN. So there are two points where T(I,J) is not defined and NaN should be replaced with 0.
Two nested for…end loops are time consuming and are useless considering MatLab functions…

A more efficient code should be:

1
2
3
4
5
6
Cs=sum(x); N=sum(Cs) %Columns sum and total observed elements
[I,J]=ndgrid(0:1:Cs(1) 0:1:Cs(2));
warning('OFF','MATLAB:divideByZero') %disable the warning
TX=(I./Cs(1)-J./Cs(2))./realsqrt(((I+J)./N).*(1-((I+J)./N)).*sum(1./Cs)); %computes the Wald's statistics
warning('ON','MATLAB:divideByZero') %renable the warning
TX([1 end])=0; %resolve the singularities

The ndgrid function creates two square matrix:

Barnard5

So TX will be a Cs(1)+1 x Cs(2)+1 matrix. Using .* and ./ all elements of the matrix will be correctly multiplied or divided.

To set up the observed matrix (TXo) and the index of all TX≥TXo

1
2
TX0=abs(TX(x(1)+1,x(3)+1)); %catch the observed Table (TXo), taking the 0 in account
idx=TX>=TX0; %set the index of all T(X)>=TXo

Step 2: Compute P(X|π)

Now, we must compute P(X|π) for each I, J and π∈(0, 1) . I and J were just defined by ndgrid function. Expanding the binomials we obtain:

Barnard6

As I explained in my previous post Fisher’s conditional exact test on 2×2 matrix, it is useful to use the logarithms of P(X|π): the factorials can be obtained using the MatLab Gammaln function; the multiplications became sums; the divisions became subtractions and the power became multiplications.

Barnard7

CF is a matrix with the same dimensions of TX because it is a linear combination of I and J.

Barnard8The S matrix is a 3D matrix: two dimensions are Cs1+1 and Cs2+1; the third dimension is set by the user. In fact in the arguments of this function, the user can choose to set the length (the Tbx variable) of the vector π (by default it is 100). Let we see the code:

1
2
3
4
5
6
7
8
9
10
%Set the basic parameters...
A=[1 1 Tbx];
B=Cs+1;
npa = linspace(0.0001,0.9999,Tbx);
LP=log(npa);
ALP=log(1-npa);
E=repmat(I+J,A); F=N-E;
%Generate a 3D matrix (Cs1+1 x Cs2+1 x Tbx): this is a "box" where each of Tbx 2D
%slice is the Cs1+1 x Cs2+1 matrix CF
CF=repmat(sum(gammaln(B))-(gammaln(I+1)+gammaln(J+1)+gammaln(B(1)-I)+gammaln(B(2)-J)),A);

Now we have two 1xTbx arrays (LP and ALP) and three Cs1+1 x Cs2+1 x Tbx “boxes” (E, F and CF).
As Peter J. Acklam wrote in his “Matlab array manipulation tips and tricks” , there is a no for…loop solution to multiply each 2D slice with the corresponding
element of a vector. He wrote:
Assume X is an m-by-n-by-p array and v is a row vector with length p.
How does one write:

Y = zeros(m, n, p);
for i = 1:p
Y(:,:,i) = X(:,:,i) * v(i);
end

with no for-loop? One way is to use:
Y = X .* repmat(reshape(v, [1 1 p]), [m n]);


So we can construct the two others 3D boxes:
Box1=CF
Box2=E.*repmat(reshape(LP,[1 1 Tbx]),[Cs1+1 Cs2+1])=E.*repmat(reshape(LP,A),B)
Box3=F.*repmat(reshape(ALP,[1 1 Tbx]),[Cs1+1 Cs2+1])=F.*repmat(reshape(ALP,A),B)
Finally, we can obtain the S box that is the sum of these three boxes (remember that we are using logarithms, so we must convert them using the exp function)

1
S=exp(CF+E.*repmat(reshape(LP,A),B)+F.*repmat(reshape(ALP,A),B));

Now we are at the last point: for each 2D slice of the S box we must sum the values corresponding to TX>=TXo, obtaining a new vector P.
To use the logical indexing tecnique, we must replicate the idx matrix:
S(repmat(idx,A)) is a column vector that has L*Tbx row;
L=sum(idx(idx==1)) is the number of 1 in the idx matrix.
Using the reshape function we can obtain a new LxTbx matrix: in each column we’ll have the P(X|π & TX>=TXo) and using the function sum we’ll, finally,
obtain the P vector.

1
P=sum(reshape(S(repmat(idx, A)),sum(idx(idx==1)),Tbx));

Now we can find the p-value and the nuisance parameter:

1
2
PV=max(P); %The p-value is tha max value of the P vector;
np=npa(P==PV); %The nuisance parameter is the value of the npa array coinciding with PMax

Display the results…

An example

Supposing to have this matrix

Vaccine Placebo
Row sum
Flu infection
7 12 19
Healthy
8 3 11
Column sum 15 15 30

Using the Fisher’s conditional exact test the p-value (1-tailed) is 0.6407: you will conclude that the vaccine is not effective.
Using the Barnard’s unconditional exact test the results will be:

2×2 matrix Barnard’s exact test: 100 16×16 tables were evaluated
——————————————————————————–
Wald statistic = 1.8943 – Nuisance parameter = 0.6666
p-values 1-tailed = 0.034074 2-tailed = 0.068148
——————————————————————————–

Barnard9

So with the Barnard’s test you will conclude that the vaccine is effective.
Anyway, using the Fisher’s conditional exact test with the Lancaster’s correction (mid p) you will obtain a 2-tailed p-value=0.04774 ….

If any problems occurs in execution, or if you found a bug, have a suggestion or question just contact me at: giuseppe dot cardillo-edta at poste dot it

You can visit my homepage http://home.tele2.it/cardillo

My profile on XING http://www.xing.com/go/invita/13675097

My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

Download Now

VN:F [1.8.1_1037]
Rating: 8.0/10 (1 vote cast)
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)

Popularity: 2% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Fisher’s conditional exact test on 2×2 matrix

October 21, 2009 by Giuseppe Cardillo · Leave a Comment
Filed under: Algorithms, Probability, Statistics 
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)
VN:F [1.8.1_1037]
Rating: 8.0/10 (1 vote cast)

Fisher’s conditional exact test on 2×2 matrix

Fisher’s exact test is a statistical significance test used in the analysis of contingency tables where sample sizes are small. It is named after its inventor, R. A. Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests. Fisher is said to have devised the test following a comment from Muriel Bristol, who claimed to be able to detect whether the tea or the milk was added first to her cup.

The test is useful for categorical data that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between the two kinds of classification. So in Fisher’s original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Ms Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated – that is, whether Ms Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a 2 x 2 contingency table. The p-value from the test is computed as if the margins of the table are fixed, i.e. as if, in the tea-tasting example, Ms. Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to a hypergeometric distribution of the numbers in the cells of the table.

Consider a general 2×2 matrix

Category 1 Category 2
Row sum
Condition 1
A B RS1
Condition 2
C D RS2
Column sum CS1 CS2 total = N

What does the Fisher’s algorithm plan?

  1. Enumerate all possible tables, given RS1 RS2 CS1 CS2;
  2. For each table compute p-value using hypergeometric distribution;
  3. Sum all p-values≤p-value of the observed table

Now we’ll analyze the algorithm step-by-step, observing the optimizations that can be implemented using MatLab.

Step 1: Enumerate all possible tables.

A 2×2 matrix has only one degree of freedom. What does this mean? This means that if you change the cell A all the other cells are determined: B=RS1-A; C=CS1-A; D=RS2-CS1+A.

Question: Which is the range of variation for A?
Answer: 0≤A≤min(RS1,CS1)

First optimization: rearrange the matrix (if needed) so as to obtain: RS1≤RS2 and CS1≤CS2

1
2
3
4
5
6
7
8
9
10
11
12
13
Rs=sum(x,2); %rows sum
Cs=sum(x); %columns sum
N=sum(Rs); %sum of all elements
 
%Rearrange the matrix if necessary.
if ~issorted(Rs)
    x=flipud(x);
    Rs=sort(Rs);
end
if ~issorted(Cs)
    x=fliplr(x);
    Cs=sort(Cs);
end

Now you can enumerate all the possible tables:

1
2
A=0:1:min(Rs(1),Cs(1)); %all possible values for the first cell
z=[A;Rs(2)-Cs(1)+A;Rs(1)-A;Cs(1)-A;]; %all possible tables

Note that in the z matrix the cells are disposed in this order [A D B C] (in the next step you will understand the reason…).

Step 2: For each table compute the p-value using the hypergeometric distribution

First of all, now we know how many tables we wrote down and so we can use the preallocation (see http://www.advancedmcode.org/writing-fast-matlab-code-3-preallocation.html).

1
np=zeros(1,length(A)); %p-values vector preallocation

In this step we must compute the p-value for each table using the hypergeometric distribution. In general, the p-value is computed as:

fishergeneral

Considering that we are using a 2×2 matrix:

fisher22

Remember that: factorial1 and x is a natural number.

As you can see, the factorial (!) growth very quickly. Moreover the computation of a factorial is very time expensive. Luckly, there is a very important relation between the factorial and another function: the Gamma function.

gammafunction

Matlab can compute the Gamma function using a matrix as input. Anyway, we didn’t yet solve the factorial growing (20!=2.4329e+18). The management of so big number is very troubling. The solution is to use the logarithm function: in fact, log(20!)=42.3356 a more tractable number.

factorial4

Matlab, of course, can compute the gammaln function. Now, we are computing log(p-value), so when we’ll finish we must convert into p-value using exp function.

Remember two log properties (they will be useful later): LOG(A*B)=LOG(A)+LOG(B)   and   LOG(A/B)=LOG(A)-LOG(B).

Now, it is the time to observe another useful property of 2×2 matrix.
When you go from the Table Ti to the Table Ti+1:

factorial2Substituting you will find a recursive relation:

factorial3

factorial5

factorial6

And, more in general:

factorial8

At this point, the only problem is to compute the p-value of the first table. Consider the first Table:

Category 1 Category 2
Row sum
Condition 1
0 RS1 RS1
Condition 2
CS1 D=CS2-RS1 RS2
Column sum CS1 CS2 total = N

And so:

fisher22_p0Finally, the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
np=zeros(1,length(A)); %p-values vector preallocation
lz=log(z);
%LOG(X!)=GAMMALN(x+1)
np(1)=sum(gammaln([Rs(2)+1 Cs(2)+1])-gammaln([N+1 z(2)+1]));
%remember that
%np(i+1)=np(i)*[B(i)*C(i)]/(A(i+1)*D(i+1)]=np(i)*f(i)
%This formula is vectorizable and is recursive!
%using Logarithm to improve computation
%log(np(i+1))=log(np(i))+[log(B(i))+log(C(i))]-[log(A(i+1))+log(D(i+1))]=log(np(i))+f(i)
%J>1
%log(np(J))=log(np(1))+sum(f(1:J-1))=log(np(1))+cumsum(f)
f=sum(lz(3:4,1:end-1))-sum(lz(1:2,2:end));
np(2:end)=np(1)+cumsum(f);
np=exp(np);

Step 3: Sum all p-values≤p-value of the observed table

Now, to obtain the 2-tail p-value we must sum all p-values≤p-value of the observed table. In the z matrix, the observed table will be in the A+1 row (because you must also consider the 0) and, of course, in the np vector the p-value of the observed table will be in A+1 row.
On the contrary, if you desire the 1-tail p-value you must sum all A+1 p-values.
The code:

1
2
3
4
5
6
W=x(1)+1;
if x(1)<=round(A(end)/2) %choose direction
     P=[sum(np(1:W)) sum(np(W:end)) sum(np(np<=np(W)))];
else
     P=[sum(np(W:end)) sum(np(1:W)) sum(np(np<=np(W)))];
end

Finally, there is another “little” question. The Fisher’s exact test is a “conditional” method and is too conservative in comparison to an “unconditional” method like the Barnard’s test; in other words is less powerful than Barnard’s test. On the contrary, the Barnard’s test is so difficult to implement that none uses it (but me… see mi post on Barnard’s unconditional exact test).
The Lancaster’s correction (mid-p) to Fisher’s exact test makes it less conservative and more powerful and the results are similar to Barnard’s Test.

The Lancaster’s correction envisages that the p-values of all tables more extreme than the observed are summed to one half of the observed p-value. In code

1
P-mid=0.5*np(W)+sum(np(np<np(W)));

Don’t forget to display the results!

Step 4: Compute the power

The routine computes the Power and, if necessary, the sample sizes needed to achieve a power=0.80 using a modified asymptotic normal method with continuity correction as described by Hardeo Sahai and Anwer Khurshid in Statistics in Medicine, 1996, Vol. 15, Issue 1: 1-21. Because there aren’t interesting tricks, if you are interested you can directly read the code.

If any problems occurs in execution, or if you found a bug, have a suggestion or question just contact me at: giuseppe dot cardillo-edta at poste dot it

You can visit my homepage http://home.tele2.it/cardillo

My profile on XING http://www.xing.com/go/invita/13675097

My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

Download Now

VN:F [1.8.1_1037]
Rating: 8.0/10 (1 vote cast)
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Coins

October 14, 2009 by Giuseppe Cardillo · Leave a Comment
Filed under: General, Utility 
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)
VN:F [1.8.1_1037]
Rating: 9.3/10 (3 votes cast)

Coins

How do you choose the coins in your pocket to pay something? There is an algorithm to use the max numbers of coins? Of course there is, as shown in the book “L’algoritmo del parcheggio” (The parking algorithm) by Prof. Furio Honsell. This GUI help you to choose the exact combination of coins that you have in your pocket to pay the bill.

These are the coins in your pocketFor example: Suppose you are in Naples (Italy, my city) and have in your pocket this situation. Now you are in “Piazza del Plebiscito” and you want to drink a coffee in the famous “Gambrinus” Bar. You go to the cash, order the coffee and the beautiful cashier give you the sales cash asking 0.80 euro (I wish…). All these coins in your pocket are very heavy to carry, and so you want to give as many as possible coins to the cashier (that will be happiest to have all these coins in the cash to give them as change). The situation can be resumed into a mathematical model:

Value

(values)

Quantity

(coinsin)

Total

(coinval)

Cumulative sum (V)

0.01€

6

0.06€

0.06€

0.02€

5

0.10€

0.16€

0.05€

5

0.25€

0.41€

0.10€

5

0.50€

0.91€

0.20€

2

0.40€

1.31€

0.50€

1

0.50€

1.81€

1€

1

1€

2.81€

2€

1

2€

4.81€

TOTAL

26

4.81€

4.81€

The bill to pay is stored into the T variable. As you can see, it is possible to us all coins less than 0.20€ to pay the bill. But what is the most efficient methods?

1
2
3
4
5
6
7
8
while T>0 %saddle the cashier with coins
    V0=find(V>T,1,'first'); %find the first value in the V array greater than T
    coinsout(V0)=coinsout(V0)+1; %give the coin to the cashier
    coinsin(V0)=coinsin(V0)-1;%this coin is no more in your pocket
    coinval(V0)=coinval(V0)-values(V0); %update the array
    V=cumsum(coinval); %update the array
    T=T-values(V0); %update the bill to pay
end

But is it truly so simple? No, of course. The floating point representation of cents sometimes creates troubles: for example: 1.0000-1 could not be 0 but 1.45782e-17 (that is >0…. ). To fix this problem, multiply (and after divide before printing results) by 100: values, coinval and T.

Finally, this is the result:

coins GUI

If any problems occurs in execution, or if you found a bug, have a suggestion or question just contact me at: giuseppe dot cardillo-edta at poste dot it

You can visit my homepage http://home.tele2.it/cardillo

My profile on XING http://www.xing.com/go/invita/13675097

My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

Download Now

VN:F [1.8.1_1037]
Rating: 9.3/10 (3 votes cast)
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Random Fibonacci sequence

October 13, 2009 by Giuseppe Cardillo · Leave a Comment
Filed under: Mathematics 
VN:F [1.8.1_1037]
Rating: 0 (from 0 votes)
VN:F [1.8.1_1037]
Rating: 0.0/10 (0 votes cast)

Random Fibonacci sequence

The Fibonacci sequence is one of my favourite argument.
In the Fibonacci sequence of numbers, each number is the sum of the previous two numbers, starting with 0 and 1. Thus the sequence begins 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610 etc.
The higher up in the sequence, the closer two consecutive “Fibonacci numbers” of the sequence divided by each other will approach the golden ratio (φ = approximately 1.6180339887…) as you can see in this figure. The golden ratio is the more irrational number that exist and has a lot of property.

The Fibonacci Sequence
Figure 1. The Standard Fibonacci Sequence (Click to enlarge)

This post wants to highlight the property of a “Random” Fibonacci sequence. In 1999, Divakar Viswanath wondered what would happen to the Fibonacci sequence if he introduced an element of randomness.

Here’s one way to proceed: Start with the numbers 1 and 1, as in the original Fibonacci sequence. To get the next term, flip a coin to decide whether to add the last two terms or subtract the last term from the previous term. Suppose that heads means add and tails means subtract. Tossing heads would result in adding 1 to 1 to get 2, and tossing tails would lead to subtracting 1 from 1 to get 0. According to this scheme, the successive coin tosses H H T T T H, for example, would generate the sequence 1, 1, 2, 3, –1, 4, –5, –1. It’s easy to write a short computer program to generate these random Fibonacci sequences. Looking for patterns and trends among such sequences of numbers can be a fascinating pastime, he says. Indeed, infinitely many sequences follow Viswanath’s rule. A few have special characteristics. If the coin always comes up heads, for instance, the result is the original Fibonacci sequence. Other strings of coin tosses can produce a repeating pattern, such as 1, 1, 0, 1, 1, 0, 1, 1, 0, and so on. Nonetheless, such special cases are sufficiently rare among all possible sequences that mathematicians ignore them. The standard Fibonacci sequence has an intriguing property (as shown in Figure 1). The hundredth Fibonacci number, for example, is roughly equal to the hundredth power of the golden ratio. Random Fibonacci sequence despite significant fluctuations, the absolute values of the first 1,000 terms of a typical, computer-generated random Fibonacci sequence show a clear trend to larger values, fitting a pattern of exponential growth (Figure 2).

A Random Fibonacci Sequence
A Random Fibonacci Sequence (Click to enlarge)

By examining typical random Fibonacci sequences based on coin tosses, Viswanath uncovered a similar pattern. He ignored the minus signs, thereby taking the absolute value of the terms. He found that the hundredth term in such a sequence, for example, is approximately equal to the hundredth power of the number 1.13198824…. In fact, the higher the term, the closer its value gets to the appropriate power of 1.13198824…. Despite the element of chance and the resulting large fluctuations in value that characterize a random Fibonacci sequence, the absolute values of the numbers, on average, increase at a well-defined exponential rate. It’s not obvious that this should happen, Viswanath observed. Random Fibonacci sequences might have leveled off to a constant absolute value because of the subtractions, for example, but they actually escalate exponentially. Providing a rigorous proof of the result was a tricky business. To get the answer he required, Viswanath had to delve into several different areas of mathematics, including the intricacies of geometric forms known as fractals, and finish with a computer calculation. Viswanath’s achievement “showed persistence and imagination of a very high order,” Trefethen remarks. Now, Devlin adds, “mathematics has a new constant.” No one has yet identified any link between this particular number and other known constants, such as the golden ratio. — Surprisingly, Viswanath’s constant provides one answer to a mathematical puzzle that arose several decades ago from the work of Hillel Furstenberg, now at Hebrew University in Jerusalem, and Harry Kesten of Cornell University. In a different mathematical context involving so-called random matrix multiplication, Furstenberg and Kesten had proved that in number sequences generated by certain types of processes having an element of randomness, the value of the nth term of the sequence gets closer to the nth power of some fixed number. However, they provided no hint of what that fixed number might be for any particular sequence. Because random Fibonacci sequences fit into this category of sequences, Viswanath’s new constant represents an accessible example of these fixed numbers. “It is a beautiful result with a variety of interesting aspects,” Trefethen says. It’s a nice illustration, for example, of how a random process can lead to a deterministic result when the numbers involved get very large. Moreover, although Viswanath’s result by itself has no obvious applications, it serves as an introduction to the sophisticated type of mathematics developed by Furstenberg, Kesten, and others. That mathematical machinery has proved valuable in accounting for properties of disordered materials, particularly how repeated random movements can lead to orderly behavior, Devlin says. Such mathematics underlies explanations of why glass is transparent and how an electric current can still pass in an orderly fashion through a semiconductor laced with randomly positioned impurities. Viswanath’s original work was done at Cornell University, under Trefethen’s supervision. Trefethen and Oxford’s Mark Embree have recently studied slightly modified versions of the random Fibonacci sequence. If, for example, one combines the last known term with half the previous term, adding or subtracting according to the toss of a coin, the sequence’s numbers decrease at a certain rate, displaying exponential decay. By using fractions other than one-half, it’s possible to find fractions for which one gets exponential decay, exponential growth, or merely equilibrium.

If any problems occurs in execution, or if you found a bug, have a suggestion or question just contact me at:
giuseppe dot cardillo-edta at poste dot it

You can visit my homepage http://home.tele2.it/cardillo

My profile on XING http://www.xing.com/go/invita/13675097

My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

Download Now

VN:F [1.8.1_1037]
Rating: 0.0/10 (0 votes cast)
VN:F [1.8.1_1037]
Rating: 0 (from 0 votes)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Clinical test performance

October 7, 2009 by Giuseppe Cardillo · Leave a Comment
Filed under: Biotecnology, Genetic 
VN:F [1.8.1_1037]
Rating: 0 (from 0 votes)
VN:F [1.8.1_1037]
Rating: 9.0/10 (2 votes cast)

Clinical test performance

This function was written to compute some benchmarks for a clinical test, but it can be used in all dichotomous test.

It was selected for the publication into two books:

S. Harisha – Biotechnology procedures and experiments handbook – Infinity Science Press – Hingham, MA, USA – ISBN 978-1-934015-11-7

A.J. Nair – Introduction to Biotechnology and genetic engineering – Infinity Science Press – Hingham, MA, USA – ISBN 978-1-934015-16-2

For a clinical test, you have two conditions (healthy and sick) and a dichotomous test (positive or negative). For an ideal, perfect test, it is positive for all sick and for none healthy subjects and negative for all healthy and none sick. But we live in a real world and so, the situation can be summarized by a 2×2 matrix (call it X)

Sick (D+) Healthy (D-)
Row sum
Test Positive (T+)
True
Positive
False
Positive
R1
Test Negative (T-)
False
Negative
True
Negative
R2
Column sum C1 C2 total = N

We define:

1
2
3
4
cs=sum(X);
rs=sum(X,2);
N=sum(X(:));
d=diag(X);

Prevalence of disease

The prevalence of disease is defined: D+ over N = (True Positive + False Negative)/N = C1/N

1
pr=cs(1)/N;

Sensitivity and Specificity

The Sensitivity is the probability that the test is positive on sick subjects: P(T+|D+) = True Positive over C1 = TP/C1
The Specificity is the probability that the test is negative on healthy subjects: P(T-|D-) = True Negative over C2 = TN/C2
In Matlab both parameters are obtained with only one instruction:

1
SS=d./cs';

Of course, the false proportions will be

1
fp=(1-SS);

The 95% confidence interval critical values for Sensitivity and Specificity are computed in this way:

1
acv=1.96*sqrt(SS.*fp./cs');

Of course, the critical interval lower bound cannot be less than 0 and the upper bound cant be greater than 1 and so:

CI=max(0,sensivity-aci) – min(1,sensivity+aci)

Youden’s index (J)

Youden’s J statistics (also called Youden’s index) is a single statistic that captures the performance of a diagnostic test.
The use of such a single index is “not generally to be recommended”. It is equal to the risk difference for a dichotomous test and it defined as:
J = Sensitivity + Specificity – 1

1
J=sum(SS)-1;

A perfect test has J=1

Accuracy and Mis-classification rate

The Accuracy (or Power) is the probability that the test correctly classifies the subjects and is defined: Acc=(TP+TN)/N

1
acc=trace(X)/N;

The Mis-classification rate is the complement to 1:

1
mcr=1-acc;

Positive and Negative predictivity

The Positive predictivity is the probability that the subjects is sick when the test is positive: P(D+|T+) = True Positive over R1 = TP/R1
The Negative predictivity is the probability that the subjects is Healthy when the test is negative: P(D-|T-) = True Negative over R2 = TN/R2
In Matlab both parameters are obtained with only one instruction:

1
PNp=d./rs;

The 95% confidence interval critical values for positive and negative predictivity are computed in this way:

1
PNpcv=1.96*sqrt(PNp.*(1-PNp)./rs);

As for sensitivity and specificity, the critical interval lower bound cannot be less than 0 and the upper bound cant be greater than 1.

Positive and Negative Likelihood Ratio

When we decide to order a diagnostic test, we want to know which test (or tests) will best help us rule-in or rule-out disease in our patient. In the language of clinical epidemiology, we take our initial assessment of the likelihood of disease (“pre-test probability”), do a test to help us shift our suspicion one way or the other, and then determine a final assessment of the likelihood of disease (“post-test probability”).

Likelihood ratios tell us how much we should shift our suspicion for a particular test result. Because tests can be positive or negative, there are at least two likelihood ratios for each test. The “positive likelihood ratio” (LR+) tells us how much to increase the probability of disease if the test is positive, while the “negative likelihood ratio” (LR-) tells us how much to decrease it if the test is negative.

The LR+ and LR- are defined in terms of sensitivity and specificity: LR+ = sensitivity / (1-specificity); LR- = (1-sensitivity) / specificity.
Considering the previous defined arrays SS and fp

1
2
PLR=SS(1)/fp(2);
NLR=fp(1)/SS(2);

Error and Diagnostic Odds Ratio

Error Odds Ratio indicates if the probability of being wrongly classified is highest in the diseased or in the non-diseased group.
If the error odds is higher than one the probability is highest in the diseased group (and the specificity of the test is better than the sensitivity),
if the value is lower than one the probability of an incorrect classification is highest in the non-diseased group (and the sensitivity of the test is better than the specificity).
EOR is defined as (Sensitivity/(1-Sensitivity))/(Specificity/(1-Specificity));

Diagnostic Odds Ratio is often used as a measure of the discriminative power of the test. Has the value one if the test does not discriminate between diseased and not diseased.
Very high values above one means that a test discriminates well. Values lower than one mean that there is something wrong in the application of the test.
DOR is defined as
(Sensitivity/(1-Sensitivity))/((1-Specificity)/Specificity);

1
2
EOR=(SS(1)/fp(1))/(SS(2)/fp(2));
DOR=(SS(1)/fp(1))/(fp(2)/SS(2));

Discriminant Power

The discriminant power for a test, also termed the test effectiveness, is a measure of how well a test distinguishes between affected and unaffected persons. It is the sum of logs of Sensivity and Specificity over own false proportion, scaled by the standard deviation of the logistic normal distribution curve (square root of 3 divided by π). Test effectiveness is interpreted as the standardized distance between the means for both populations.

1
dpwr=(realsqrt(3)/pi)*sum(log(SS./fp));

A test with a discriminant value of 1 is not effective in discriminating between affected and unaffected individuals.
A test with a discriminant value of 3 is effective in discriminating between affected and unaffected individuals.

Test Bias

A test which shows provable and systematic differences in the results of people based on group membership.
For example, a test might be considered biased if members of one particular gender or race consistently and systematic have statistically different results from the rest of the testing population.
It is defined as (T+)/(D+)=(TP+FP)/(TP+FN)

A perfect test has a TB=1;
if TB<1 the test underestimates the disease because there are more affected than positive test;
if TB>1 the test overestimates the disease because there are more positive test than affected

1
TB=rs(1)/cs(1);

If any problems occurs in execution, or if you found a bug, have a suggestion or question just contact me at:
giuseppe dot cardillo-edta at poste dot it

You can visit my homepage http://home.tele2.it/cardillo

My profile on XING http://www.xing.com/go/invita/13675097

My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

Download Now

VN:F [1.8.1_1037]
Rating: 9.0/10 (2 votes cast)
VN:F [1.8.1_1037]
Rating: 0 (from 0 votes)

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Next Page »