Barnard’s unconditional exact test on 2×2 matrix

VN:F [1.8.8_1072]
Rating: +1 (from 1 vote)
VN:F [1.8.8_1072]
Rating: 8.0/10 (1 vote cast)

Barnard’s unconditional exact test on 2×2 matrix

There are two fundamentally different exact tests for comparing the equality of two binomial probabilities – Fisher’s exact test (Fisher, 1925), and Barnard’s exact test (Barnard, 1945). Fisher’s exact test (Fisher, 1925) is the more popular of the two. In fact, Fisher was bitterly critical of Barnard’s proposal for esoteric reasons that we will not go into here. For 2 × 2 tables, Barnard’s test is more powerful than Fisher’s, as Barnard noted in his 1945 paper, much to Fisher’s chagrin. Anyway, perhaps due to its computational difficulty the Barnard’s is not widely used.
Consider a general 2×2 matrix

Category 1 Category 2
Row sum
Condition 1
A B RS1
Condition 2
C D RS2
Column sum CS1 CS2 total = N

Suppose that the common probability of infection under the null hypothesis is πA = πC = π. Then the probability of observing any generic table X is a product of two binomials:

Barnard1

The exact p-value is then the sum of such probabilities for all tables, X, that could have been observed which are at least as extreme as the observed table, X0 , under the null hypothesis. Specifically, suppose T (X) is a “discrepancy measure” or test statistic that measures how discrepant any table X is relative to the type of table one would expect under the null hypothesis. Then, for any given π, the exact p-value of the observed table X0 is:

Barnard2Now there exist many different test statistics for determining if a table X is more or less extreme than the observed table X0 under the null hypothesis: we will use the popular Wald statistic:

Barnard3

Although the Wald test statistic T (X) is well defined, equation (2) is not of much practical use for computing an exact p-value because it depends on knowing the value of π, the common probability under the null hypothesis. One could of course use the data to estimate π and then substitute this estimate into equation (2). But then the p-value would no longer be exact. The main difference between the Fisher’s and Barnard’s tests is the manner in which they eliminate this nuisance parameter from (2) without sacrificing the exactness.

Fisher’s exact test

Fisher’s exact test gets rid of π by restricting attention only to generic 2 × 2 tables X in which A + B = Rs1, the sum of responses that were actually observed in the data. Accordingly define:

Barnard4Fisher noted that, under the null hypothesis, provided we confine our attention only to tables X ∈ Γ(Rs1), each table has a hypergeometric probability that no longer depends on the nuisance parameter π (see my post: Fisher’s conditional exact test on 2×2 matrix).

Barnard’s exact test

Barnard’s exact test is an unconditional test. It generates the exact distribution of T(X) by considering all the tables X ∈ Γ, and rather than just the tables X ∈ Γ(Rs1). What shall we do with the unknown nuisance parameter π in the above p-value (equation 1)? Barnard suggested that we calculate p(π) for all possible values of π ∈ (0, 1) and choose the value, π , say, that maximizes p(π). And this is computationally very very difficult….

The algorithm

Now, let we see the algorithm in deep.

Step 1: The Wald’s Statistic

The first step is to compute the Wald’s statistic for each possible value of A and B (from now on I and J).

I∈[0:1:Cs1] and J∈[0:1:Cs2]. One could write:

1
2
3
4
5
6
Cs=sum(x); N=sum(Cs) %Columns sum and total observed elements
for I=0:1:Cs(1)
    for J=0:1:Cs(2)
         TX(I+1,J+1)=(I/Cs(1)-J/Cs(2))/sqrt(((I+J)/N)*(1-(I+J)/N)*sum(1./Cs))
    end
end

Notice that for (I=0 & J=0) or (I=Cs(1) & J=Cs(2)) T(I,J)=0/0=NaN. So there are two points where T(I,J) is not defined and NaN should be replaced with 0.
Two nested for…end loops are time consuming and are useless considering MatLab functions…

A more efficient code should be:

1
2
3
4
5
6
Cs=sum(x); N=sum(Cs) %Columns sum and total observed elements
[I,J]=ndgrid(0:1:Cs(1) 0:1:Cs(2));
warning('OFF','MATLAB:divideByZero') %disable the warning
TX=(I./Cs(1)-J./Cs(2))./realsqrt(((I+J)./N).*(1-((I+J)./N)).*sum(1./Cs)); %computes the Wald's statistics
warning('ON','MATLAB:divideByZero') %renable the warning
TX([1 end])=0; %resolve the singularities

The ndgrid function creates two square matrix:

Barnard5

So TX will be a Cs(1)+1 x Cs(2)+1 matrix. Using .* and ./ all elements of the matrix will be correctly multiplied or divided.

To set up the observed matrix (TXo) and the index of all TX≥TXo

1
2
TX0=abs(TX(x(1)+1,x(3)+1)); %catch the observed Table (TXo), taking the 0 in account
idx=TX>=TX0; %set the index of all T(X)>=TXo

Step 2: Compute P(X|π)

Now, we must compute P(X|π) for each I, J and π∈(0, 1) . I and J were just defined by ndgrid function. Expanding the binomials we obtain:

Barnard6

As I explained in my previous post Fisher’s conditional exact test on 2×2 matrix, it is useful to use the logarithms of P(X|π): the factorials can be obtained using the MatLab Gammaln function; the multiplications became sums; the divisions became subtractions and the power became multiplications.

Barnard7

CF is a matrix with the same dimensions of TX because it is a linear combination of I and J.

Barnard8The S matrix is a 3D matrix: two dimensions are Cs1+1 and Cs2+1; the third dimension is set by the user. In fact in the arguments of this function, the user can choose to set the length (the Tbx variable) of the vector π (by default it is 100). Let we see the code:

1
2
3
4
5
6
7
8
9
10
%Set the basic parameters...
A=[1 1 Tbx];
B=Cs+1;
npa = linspace(0.0001,0.9999,Tbx);
LP=log(npa);
ALP=log(1-npa);
E=repmat(I+J,A); F=N-E;
%Generate a 3D matrix (Cs1+1 x Cs2+1 x Tbx): this is a "box" where each of Tbx 2D
%slice is the Cs1+1 x Cs2+1 matrix CF
CF=repmat(sum(gammaln(B))-(gammaln(I+1)+gammaln(J+1)+gammaln(B(1)-I)+gammaln(B(2)-J)),A);

Now we have two 1xTbx arrays (LP and ALP) and three Cs1+1 x Cs2+1 x Tbx “boxes” (E, F and CF).
As Peter J. Acklam wrote in his “Matlab array manipulation tips and tricks” , there is a no for…loop solution to multiply each 2D slice with the corresponding
element of a vector. He wrote:
Assume X is an m-by-n-by-p array and v is a row vector with length p.
How does one write:

Y = zeros(m, n, p);
for i = 1:p
Y(:,:,i) = X(:,:,i) * v(i);
end

with no for-loop? One way is to use:
Y = X .* repmat(reshape(v, [1 1 p]), [m n]);


So we can construct the two others 3D boxes:
Box1=CF
Box2=E.*repmat(reshape(LP,[1 1 Tbx]),[Cs1+1 Cs2+1])=E.*repmat(reshape(LP,A),B)
Box3=F.*repmat(reshape(ALP,[1 1 Tbx]),[Cs1+1 Cs2+1])=F.*repmat(reshape(ALP,A),B)
Finally, we can obtain the S box that is the sum of these three boxes (remember that we are using logarithms, so we must convert them using the exp function)

1
S=exp(CF+E.*repmat(reshape(LP,A),B)+F.*repmat(reshape(ALP,A),B));

Now we are at the last point: for each 2D slice of the S box we must sum the values corresponding to TX>=TXo, obtaining a new vector P.
To use the logical indexing tecnique, we must replicate the idx matrix:
S(repmat(idx,A)) is a column vector that has L*Tbx row;
L=sum(idx(idx==1)) is the number of 1 in the idx matrix.
Using the reshape function we can obtain a new LxTbx matrix: in each column we’ll have the P(X|π & TX>=TXo) and using the function sum we’ll, finally,
obtain the P vector.

1
P=sum(reshape(S(repmat(idx, A)),sum(idx(idx==1)),Tbx));

Now we can find the p-value and the nuisance parameter:

1
2
PV=max(P); %The p-value is tha max value of the P vector;
np=npa(P==PV); %The nuisance parameter is the value of the npa array coinciding with PMax

Display the results…

An example

Supposing to have this matrix

Vaccine Placebo
Row sum
Flu infection
7 12 19
Healthy
8 3 11
Column sum 15 15 30

Using the Fisher’s conditional exact test the p-value (1-tailed) is 0.6407: you will conclude that the vaccine is not effective.
Using the Barnard’s unconditional exact test the results will be:

2×2 matrix Barnard’s exact test: 100 16×16 tables were evaluated
——————————————————————————–
Wald statistic = 1.8943 – Nuisance parameter = 0.6666
p-values 1-tailed = 0.034074 2-tailed = 0.068148
——————————————————————————–

Barnard9

So with the Barnard’s test you will conclude that the vaccine is effective.
Anyway, using the Fisher’s conditional exact test with the Lancaster’s correction (mid p) you will obtain a 2-tailed p-value=0.04774 ….

If any problems occurs in execution, or if you found a bug, have a suggestion or question just contact me at: giuseppe dot cardillo-edta at poste dot it

You can visit my homepage http://home.tele2.it/cardillo

My profile on XING http://www.xing.com/go/invita/13675097

My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

Download Now

VN:F [1.8.8_1072]
Rating: 8.0/10 (1 vote cast)
VN:F [1.8.8_1072]
Rating: +1 (from 1 vote)
Barnard's unconditional exact test on 2x2 matrix8.0101

Popularity: 1% [?]

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • Live
  • PDF
  • Technorati
  • Twitter
  • Yahoo! Bookmarks
  • Add to favorites
  • email
  • MySpace
  • RSS

Related Posts

  • No Related Post

Comments

Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!


Include MATLAB code in your comment by doing the following:

<pre lang="MATLAB">

%insert code here

</pre>