Custom Search

Characteristics of Distributions

Posted by admin on Friday Apr 1, 2011 Under Statistics

Poisson Distribution
Sum of independent poission random variables is also Poison with mean = sum of the means of the random variables.

Tags : , | Comments Off

Logistic Regression and How to interpret it

Posted by admin on Friday Jan 21, 2011 Under Statistics

When to use Simple Logistic Regression

Logistic regression is used when Yi, response variable is binary, 0 or 1.

Meaning of Response Function of binary response var
Yi = beta_0 + beta_1* Xi + ei

Considering Yi as bernoulli random variable,
P(Yi =1 ) = pi ** let’s say probability of success
P(Yi = 0 ) = 1-pi ** probability of failure
E(Yi) = 1(pi) + 0(1-pi) = pi which equals P(Yi=1)
Therefore, we can say Expected value of Yi is same as probability of Yi being 1. (p of success)

Problems with binary response variables are:
1. error term can only take two values
2. variance is dependent of Xs

How to run logistic regression in R

#upload data
dat1<-read.table(dat1.txt, sep='\t', header=T)
test.logr<-glm( result~gender, family=binomial(logit))

Let Yi=1 success and Yi=0 failure and
Let probability of success (p(Yi=1)) be 0.2 and probability of failure (p(Yi=0) be 0.8.
The odds of success is p/(1-p) = 0.2/0.8 =0.25 is 1 to 4

Basically logit is transforming the odds function using log
log(p/(1-p)) . It's monotonic transformation and it can ease the problem of restricted range.

So how does logit look like?
logit(p) = log(p/(1-p)) = b0 + b1X1 + ... bkXk
p = exp(b0+b1x1 + bkxk)/(1+exp(b0+b1x1 + ... + bkxk)

How to Interpret coefficients?

logit(p) = b0 + b1(school),

where school (public =1 and private =0)

success = 0, failure = 1
private = 0 , public =1

b0 is log odds for public since we coded private =0 (baseline)
b1 = log(1.325) = Odds ratio of private to public

Let coefficients be b1 = 0.5234 and b0 = -1.23
How to interpret the coefficients?

By exponentiating b1 (that is log(1.325)), odds ratio may be calculated and it can interpreted as:

Odds for private school being successful are 33% than odds for public school.

To check, you can simply compute odds for public school and private school, then log the ratio log(1.325) then you will get b1 value.

Multiple Logistic Regression Model

It can be interpreted just like a simple logistic regression. But you interpret it as assuming that all other predictor variables are held constant.

With coefficients, you may compute odds ratio and can be worded as follows:
the odds of a student being successful increase by xx percent with each additional year of tutoring (X1) for given soceioeconomic status and location.
the odds of a student being successful in area 1 is at most 7 time as as great as for a student
in area 2. where area1 = 1 and area2 coded as 0

http://division.aomonline.org/rm/1997_forum_regression_models.html

Tags : , , , , | add comments

My Fav Pizza dough recipe

Posted by admin on Tuesday Dec 28, 2010 Under Other Interests

Thin Pizza Dough (wolfgang puck style)

(source: http://www.grouprecipes.com/46552/italian-thin-crust-pizza-dough.html)

Ingredients

1 package active dry yeast
1 teaspoon honey
1 cup warm water (105 to 115 F)
3 cups of all-purpose flour
1 teaspoon salt
1 tablespoon extra-virgin olive oil

How to cook it

Dissolve the yeast and honey in 1/4 cup warm water.

Combine the flour and the salt. Add the oil, the yeast mixture, and the remaining 3/4 cup of water.

Mix until the entire mixture forms a ball.

Turn the dough out onto a lightly floured surface.
Knead by hand 2 or 3 minutes. The dough should be smooth and firm.

Cover the dough with a clean, damp towel and let it rise in a cool spot for about 2 hours. (When ready, the dough will stretch as it is lightly pulled).

Divide the dough into 2 balls. *Alternatively you could divide into 4 balls to make into 4 pizzas, about 6 ounces each, to make 8 inch pizzas.

Work each ball by pulling down the sides and tucking under the bottom of the ball. Repeat 4 or 5 times. Then on a smooth, unfloured surface, roll the ball under the palm of your hand until the top of the dough is smooth and firm, about 1 minutes. Cover the dough with a damp towel and let rest 1 hour. *At this point, the balls can be wrapped in plastic and refrigerated for up to 2 days.

Preheat oven to 500 F or highest temp. Lightly oil cookie sheet with extra-virgin olive oil. Roll out dough ball, on a lightly floured surface, to the shape of your cookie sheet. Carefully transfer dough to cookie sheet, lightly press and stretch out to the edges of sheet.

Add sauce (not too much) and toppings. Start with sauce, then cheese, veggies and meat.

Cook for 10 – 12 minutes, more depending on the thickness of crust due to size of pan you used.

Tags : , , | Comments Off

Plotting density in R

Posted by admin on Wednesday Oct 13, 2010 Under Statistics

How to plot density
plot(density(DATA))

Rainbow color in R
If you want to make a plot have rainbow color range, you can use rainbow function:

rcol=rainbow(length(YOURDATA))
plot(DATAX, DATAY, type=”l”)
points(DATAX, DATAY, pch=16, col=rcol)

Simple Plot

How to change the size of text in a plot?
Use argument cex.[attribute] , and examples are below:

main titles by cex.main
sub titles by cex.sub
axis annonation by cex.axis
xlab and ylab by cex.lab

Legend
legend(x, y = NULL, legend, fill = NULL, col = par(“col”),
border=”black”, lty, lwd, pch,
angle = 45, density = NULL, bty = “o”, bg = par(“bg”),
box.lwd = par(“lwd”), box.lty = par(“lty”), box.col = par(“fg”),
pt.bg = NA, cex = 1, pt.cex = cex, pt.lwd = lwd,
xjust = 0, yjust = 1, x.intersp = 1, y.intersp = 1,
adj = c(0, 0.5), text.width = NULL, text.col = par(“col”),
merge = do.lines && has.pch, trace = FALSE,
plot = TRUE, ncol = 1, horiz = FALSE, title = NULL,
inset = 0, xpd, title.col = text.col, title.adj = 0.5)

symbols for R

Tags : , , , , , | Comments Off

Useful R syntax

Posted by admin on Wednesday Oct 13, 2010 Under Statistics

Reading table of selected file from a broswer

read.table(file.choose())

nrow (dat) # number of rows
head (dat) # shows names and first few rows of dat
paste(“hello”, “world”, sep=”-”) # hello-world

source(mylibrary.R) # will import mylibrary content

rep(NA, 5) # NA NA NA NA NA
rep(1:4, 2) # 1 2 3 4 1 2 3 4
rep(1:4, each=2) # 1 1 2 2 3 3 4 4
Validating assumption of multivariate normal data

Univariate diagnostic plots : Histogram and QQ plot

Standardize the data and plot a histogram
mydata.st<-scale(mydata.dat)
hist(mydata.st, main="histgram", xlab="X values")

#qq plot

## pch =16 (16 is a symbol for a filled circle)
qqnorm(mydata.st, main="QQ plot", pch=16, col="navy")

Chi-squre plot

==========
Output multiple plots in one screen (page)

## c(2,3) determines no of rows and columns
## no of row = 2
## no of columns = 3
par(mfrow=c(2,3))

Parameters for graphs
Pch : plotting character, i.e., symbol to use
there are 18 symbols.

============
Random variable generator in R
# Standard normal
# n: number of values you want to generate
rnorm(n)

# Chi-square
# n: no of values, df: degrees of freedom
rchisq(n, df)

# Cauchy
# n: no of values
rcauchy(n)

Create a Matrix in R

yes no maybe
apple 1 4 7
orange 2 5 8
banana 3 6 9

Evac <- matrix(c(1,2,3,4,5,6,7,8,9), 3, 3, dimnames=list(fruit=c("apple", "orange", "banana"), answer=c("yes", "no", "maybe")))

Perform Fishers Exact Test in R
fisher.test(Evac)

Manipulating data frame and data
When reading a large set of data, it is better to scan than loading the whole data set.

Using linux command in R is a good way to save processing time

grep
string function that returns indices of your interest

#print working directory path
getwd()

#set working directory path
setwd("C://...")

# installing packages
install.packages(package_name)

# print files and dir in the working dir
list.files()

# Lower and Uppercase
toupper # to uppercase
tolower # to lowercase

Tags : | Comments Off

calculus for statistics

Posted by admin on Friday Aug 27, 2010 Under Statistics

Tags : | Comments Off

Manipulating data frame/table

Posted by admin on Friday Aug 20, 2010 Under Statistics

To eliminate rows with condition

# eliminate rows that Age is empty
dat<-dat[-which(temp$Age==""),]

Tags : , , , , , , | Comments Off

r error: FEXACT error 7

Posted by admin on Friday Aug 20, 2010 Under Statistics

R Error: FEXACT error 7

Testing with small sample size, it is more preferable to use the Fisher’s Exact test than the Chi-square test.
fisher.test(counts, simulate.p.value=TRUE)

If you have too many rows or columns, you may get an error saying,
FEXACT error 7.
LDSTP is too small for this problem.
Try increasing the size of the workspace.

You can still do the test by adding “simulate.p.value=TRUE”
fisher.test(counts, simulate.p.value=TRUE)

Tags : , , , , , , , , , | Comments Off

Contingency Table for Categorical data and R

Posted by admin on Wednesday Aug 11, 2010 Under Statistics

How to create contingency table from categorical data in r.

Example:
There are three categorical variables x1, x2, x3 measured from wild cats where
x1 = gender (male, female)
x2 = age (young, kitten, adult)
x3 = test result ( positive = 1, negative =0).

r table will generate two tables: 2by2 table for each of x3=0 and x3=1.

# r code
table(x1, x2, x3)

As shown below, the R output has two parts when x3=0 and x3=1.
Row represents Gender (x1) and the column represents Age (x2).
The numbers are counts of cats that fall into the corresponding categories.

, , = 0


A K Y
F 14 84 2
M 8 97 2

, , = 1


A K Y
F 1 12 0
M 1 36 0

Tags : , , , , , , , , , | add comments

T value, F value, Z value in R

Posted by admin on Tuesday Aug 3, 2010 Under Statistics

T values

t value = qt(alpha/2, n-1)

#example
> qt(0.975, 8 )
[1] 2.306004

Tags : | add comments