R Chi Square Test Example


chisq.test() function performs chi squared contingency table tests and goodness of fit tests.

chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)

• x: a numeric vector or matrix.
• y: a numeric vector or a factor (if x is a factor of same length) or NULL (if x is a matrix).
• correct: a logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables: one half is subtracted from all |O - E| differences. No correction is done if simulate.p.value = TRUE.
• p: a vector of probabilities of the same length of x. An error is given if any entry of p is negative.
• rescale.p: a logical scalar; if TRUE then p is rescaled (if necessary) to sum to 1. If rescale.p is FALSE, and p does not sum to 1, an error is given.
• simulate.p.value: a logical indicating whether to compute p-values by Monte Carlo simulation.
• B: an integer specifying the number of replicates used in the Monte Carlo test.


For Example, there are 205 mutations in gene p53 of 514 tumors, while 96 stage IV tumors have 86 mutations. We expect that 96 stage IV tumors should have 96 x 205 / 514 = 38 mutations, while we observed 86. Is that significantly different from the general mutation pattern?


The R source code for a chi square goodness of fit test is:

> sam <- matrix(c(86,96,38,96),nrow=2,ncol=2)
> sam

[,1] [,2]
[1,] 86 38
[2,] 96 96


> chisq.test(sam)

Pearson's Chi-squared test with Yates' continuity correction
data: sam
X-squared = 10.7773, df = 1, p-value = 0.001028


> chisq.test(sam)$p.value

[1] 0.001027552



Following is a csv file example.


Following R code can do chi square test of every line in the example file:

x<-read.csv("chisq.csv",header=T,sep=",",dec=".")
zz <- file("out_chisq.txt","w")
title <- names(x)
writeLines(paste(title[1],title[2],title[3],title[4],title[5],
"Chisq P Value",sep=","),con=zz,sep="\n")
xR <- nrow(x)
sam<-array(dim=c(2,2))
for (i in 1:xR)
{
sam[1,] <- c(x[i,2],x[i,3])
sam[2,] <- c(x[i,4],x[i,5])
pv<- chisq.test(sam)$p.value
writeLines(paste(x[i,1],x[i,2],x[i,3],x[i,4],x[i,5],pv,sep=","),
con=zz,sep="\n")
}
close(zz)



The content of the output file is:

Gene,Unique.observed,Unique.expected,duplicated.observed,
duplicate.expected,Chisq P Value
TTN,27,33,60,54,0.425175749168081
GATA3,38,20,17,35,0.00116789922038592
HLA-DRB6,18,15,24,27,0.655008761576397
MUC16,13,15,28,26,0.815855072976336
NR1H2,11,15,29,25,0.473920420172139
GPRIN2,12,14,27,25,0.810181236410474
MAP3K1,15,14,24,25,1
GPRIN1,13,14,25,24,1
MLL3,12,14,26,24,0.808944275014528
MAP3K4,8,14,29,23,0.203492032204285
CDH1,17,12,17,22,0.326688384050414
ENSG00000245549,15,12,18,21,0.616574005797083
ZNF384,12,12,20,20,0.796253414737639
FRG1B,11,11,20,20,0.790676108831151
AKD1,9,11,21,19,0.784191229401619
OBSCN,12,11,17,18,1
NCOA3,8,10,20,18,0.77477725929156
USH2A,8,10,20,18,0.77477725929156
ENSG00000198786,12,10,15,17,0.781814003488769



Download the csv file and the R source code:
Data File
R Source Code File