Custom Search

calculus for statistics

Posted by admin on Friday Aug 27, 2010 Under Statistics

Tags : | Comments Off

Manipulating data frame/table

Posted by admin on Friday Aug 20, 2010 Under Statistics

To eliminate rows with condition

# eliminate rows that Age is empty
dat<-dat[-which(temp$Age==""),]

Tags : , , , , , , | Comments Off

r error: FEXACT error 7

Posted by admin on Friday Aug 20, 2010 Under Statistics

R Error: FEXACT error 7

Testing with small sample size, it is more preferable to use the Fisher’s Exact test than the Chi-square test.
fisher.test(counts, simulate.p.value=TRUE)

If you have too many rows or columns, you may get an error saying,
FEXACT error 7.
LDSTP is too small for this problem.
Try increasing the size of the workspace.

You can still do the test by adding “simulate.p.value=TRUE”
fisher.test(counts, simulate.p.value=TRUE)

Tags : , , , , , , , , , | Comments Off

Contingency Table for Categorical data and R

Posted by admin on Wednesday Aug 11, 2010 Under Statistics

How to create contingency table from categorical data in r.

Example:
There are three categorical variables x1, x2, x3 measured from wild cats where
x1 = gender (male, female)
x2 = age (young, kitten, adult)
x3 = test result ( positive = 1, negative =0).

r table will generate two tables: 2by2 table for each of x3=0 and x3=1.

# r code
table(x1, x2, x3)

As shown below, the R output has two parts when x3=0 and x3=1.
Row represents Gender (x1) and the column represents Age (x2).
The numbers are counts of cats that fall into the corresponding categories.

, , = 0


A K Y
F 14 84 2
M 8 97 2

, , = 1


A K Y
F 1 12 0
M 1 36 0

Tags : , , , , , , , , , | add comments

T value, F value, Z value in R

Posted by admin on Tuesday Aug 3, 2010 Under Statistics

T values

t value = qt(alpha/2, n-1)

#example
> qt(0.975, 8 )
[1] 2.306004

Tags : | add comments

Validating assumption of multivariate normal data

Posted by admin on Monday Aug 2, 2010 Under Statistics

Univariate and Multivariate diagnostics

Univariate diagnostic (Histogram and QQ plot)

Plot a histogram

hist(mydata.st, main="histgram", xlab="X values")

Plot QQ plot

## pch =16 (16 is a symbol for a filled circle)
qqnorm(mydata.st, main="QQ plot", pch=16, col="navy")


Multivariate dignostics

Chi-squre plot
We will graph distance vs chsq

# function to compute distance between X and X.bar
# argument is X, X.bar and S.inv
f.dist <- function(x,x.bar,S.inv){
return(t(x-x.bar)%*%S.inv%*%(x-x.bar))
}

dist_<-apply(mydat.dat, 1, f.dist, x.bar=apply(mydat.dat, 2, mean), S.inv= solve(cov(mydat.dat)))

# Compute u's from chi-square
u.cs <- qchisq((1:150-.5)/150, 4)

Make a plot
plot(u.cs, sort(dist_), pch=16, col="navy", xlab="Theoretical Quantiles", ylab="Sample Quantiles", main="Chi-Square Plot")

If the chi-square plot has a line with a slope=1 and intercept=0, then the data can be assumed to be multivariate normal

Tags : , , , , , , , , , , | 2 comments