calculus for statistics
Posted by admin on Friday Aug 27, 2010 Under Statistics

To eliminate rows with condition
# eliminate rows that Age is empty
dat<-dat[-which(temp$Age==""),]
R Error: FEXACT error 7
Testing with small sample size, it is more preferable to use the Fisher’s Exact test than the Chi-square test.
fisher.test(counts, simulate.p.value=TRUE)
If you have too many rows or columns, you may get an error saying,
FEXACT error 7.
LDSTP is too small for this problem.
Try increasing the size of the workspace.
You can still do the test by adding “simulate.p.value=TRUE”
fisher.test(counts, simulate.p.value=TRUE)
How to create contingency table from categorical data in r.
Example:
There are three categorical variables x1, x2, x3 measured from wild cats where
x1 = gender (male, female)
x2 = age (young, kitten, adult)
x3 = test result ( positive = 1, negative =0).
r table will generate two tables: 2by2 table for each of x3=0 and x3=1.
# r code
table(x1, x2, x3)
As shown below, the R output has two parts when x3=0 and x3=1.
Row represents Gender (x1) and the column represents Age (x2).
The numbers are counts of cats that fall into the corresponding categories.
, , = 0
A K Y
F 14 84 2
M 8 97 2
, , = 1
A K Y
F 1 12 0
M 1 36 0
T values
t value = qt(alpha/2, n-1)
#example
> qt(0.975, 8 )
[1] 2.306004
Univariate and Multivariate diagnostics
Univariate diagnostic (Histogram and QQ plot)
Plot a histogram
hist(mydata.st, main="histgram", xlab="X values")
Plot QQ plot
## pch =16 (16 is a symbol for a filled circle)
qqnorm(mydata.st, main="QQ plot", pch=16, col="navy")
Multivariate dignostics
Chi-squre plot
We will graph distance vs chsq
# function to compute distance between X and X.bar
# argument is X, X.bar and S.inv
f.dist <- function(x,x.bar,S.inv){
return(t(x-x.bar)%*%S.inv%*%(x-x.bar))
}
dist_<-apply(mydat.dat, 1, f.dist, x.bar=apply(mydat.dat, 2, mean), S.inv= solve(cov(mydat.dat)))
# Compute u's from chi-square
u.cs <- qchisq((1:150-.5)/150, 4)
Make a plot
plot(u.cs, sort(dist_), pch=16, col="navy", xlab="Theoretical Quantiles", ylab="Sample Quantiles", main="Chi-Square Plot")
If the chi-square plot has a line with a slope=1 and intercept=0, then the data can be assumed to be multivariate normal