Characteristics of Distributions
Posted by admin on Friday Apr 1, 2011 Under StatisticsPoisson Distribution
Sum of independent poission random variables is also Poison with mean = sum of the means of the random variables.
Poisson Distribution
Sum of independent poission random variables is also Poison with mean = sum of the means of the random variables.
When to use Simple Logistic Regression
Logistic regression is used when Yi, response variable is binary, 0 or 1.
Meaning of Response Function of binary response var
Yi = beta_0 + beta_1* Xi + ei
Considering Yi as bernoulli random variable,
P(Yi =1 ) = pi ** let’s say probability of success
P(Yi = 0 ) = 1-pi ** probability of failure
E(Yi) = 1(pi) + 0(1-pi) = pi which equals P(Yi=1)
Therefore, we can say Expected value of Yi is same as probability of Yi being 1. (p of success)
Problems with binary response variables are:
1. error term can only take two values
2. variance is dependent of Xs
How to run logistic regression in R
#upload data
dat1<-read.table(dat1.txt, sep='\t', header=T)
test.logr<-glm( result~gender, family=binomial(logit))
Let Yi=1 success and Yi=0 failure and
Let probability of success (p(Yi=1)) be 0.2 and probability of failure (p(Yi=0) be 0.8.
The odds of success is p/(1-p) = 0.2/0.8 =0.25 is 1 to 4
Basically logit is transforming the odds function using log
log(p/(1-p)) . It's monotonic transformation and it can ease the problem of restricted range.
So how does logit look like?
logit(p) = log(p/(1-p)) = b0 + b1X1 + ... bkXk
p = exp(b0+b1x1 + bkxk)/(1+exp(b0+b1x1 + ... + bkxk)
How to Interpret coefficients?
logit(p) = b0 + b1(school),
where school (public =1 and private =0)
success = 0, failure = 1
private = 0 , public =1
b0 is log odds for public since we coded private =0 (baseline)
b1 = log(1.325) = Odds ratio of private to public
Let coefficients be b1 = 0.5234 and b0 = -1.23
How to interpret the coefficients?
By exponentiating b1 (that is log(1.325)), odds ratio may be calculated and it can interpreted as:
Odds for private school being successful are 33% than odds for public school.
To check, you can simply compute odds for public school and private school, then log the ratio log(1.325) then you will get b1 value.
Multiple Logistic Regression Model
It can be interpreted just like a simple logistic regression. But you interpret it as assuming that all other predictor variables are held constant.
With coefficients, you may compute odds ratio and can be worded as follows:
the odds of a student being successful increase by xx percent with each additional year of tutoring (X1) for given soceioeconomic status and location.
the odds of a student being successful in area 1 is at most 7 time as as great as for a student
in area 2. where area1 = 1 and area2 coded as 0
http://division.aomonline.org/rm/1997_forum_regression_models.html
(source: http://www.grouprecipes.com/46552/italian-thin-crust-pizza-dough.html)
1 package active dry yeast
1 teaspoon honey
1 cup warm water (105 to 115 F)
3 cups of all-purpose flour
1 teaspoon salt
1 tablespoon extra-virgin olive oil
Dissolve the yeast and honey in 1/4 cup warm water.
Combine the flour and the salt. Add the oil, the yeast mixture, and the remaining 3/4 cup of water.
Mix until the entire mixture forms a ball.
Turn the dough out onto a lightly floured surface.
Knead by hand 2 or 3 minutes. The dough should be smooth and firm.
Cover the dough with a clean, damp towel and let it rise in a cool spot for about 2 hours. (When ready, the dough will stretch as it is lightly pulled).
Divide the dough into 2 balls. *Alternatively you could divide into 4 balls to make into 4 pizzas, about 6 ounces each, to make 8 inch pizzas.
Work each ball by pulling down the sides and tucking under the bottom of the ball. Repeat 4 or 5 times. Then on a smooth, unfloured surface, roll the ball under the palm of your hand until the top of the dough is smooth and firm, about 1 minutes. Cover the dough with a damp towel and let rest 1 hour. *At this point, the balls can be wrapped in plastic and refrigerated for up to 2 days.
Preheat oven to 500 F or highest temp. Lightly oil cookie sheet with extra-virgin olive oil. Roll out dough ball, on a lightly floured surface, to the shape of your cookie sheet. Carefully transfer dough to cookie sheet, lightly press and stretch out to the edges of sheet.
Add sauce (not too much) and toppings. Start with sauce, then cheese, veggies and meat.
Cook for 10 – 12 minutes, more depending on the thickness of crust due to size of pan you used.
How to plot density
plot(density(DATA))
Rainbow color in R
If you want to make a plot have rainbow color range, you can use rainbow function:
rcol=rainbow(length(YOURDATA))
plot(DATAX, DATAY, type=”l”)
points(DATAX, DATAY, pch=16, col=rcol)
Simple Plot
How to change the size of text in a plot?
Use argument cex.[attribute] , and examples are below:
main titles by cex.main
sub titles by cex.sub
axis annonation by cex.axis
xlab and ylab by cex.lab
Legend
legend(x, y = NULL, legend, fill = NULL, col = par(“col”),
border=”black”, lty, lwd, pch,
angle = 45, density = NULL, bty = “o”, bg = par(“bg”),
box.lwd = par(“lwd”), box.lty = par(“lty”), box.col = par(“fg”),
pt.bg = NA, cex = 1, pt.cex = cex, pt.lwd = lwd,
xjust = 0, yjust = 1, x.intersp = 1, y.intersp = 1,
adj = c(0, 0.5), text.width = NULL, text.col = par(“col”),
merge = do.lines && has.pch, trace = FALSE,
plot = TRUE, ncol = 1, horiz = FALSE, title = NULL,
inset = 0, xpd, title.col = text.col, title.adj = 0.5)
read.table(file.choose())
nrow (dat) # number of rows
head (dat) # shows names and first few rows of dat
paste(“hello”, “world”, sep=”-”) # hello-world
source(mylibrary.R) # will import mylibrary content
rep(NA, 5) # NA NA NA NA NA
rep(1:4, 2) # 1 2 3 4 1 2 3 4
rep(1:4, each=2) # 1 1 2 2 3 3 4 4
Validating assumption of multivariate normal data
Univariate diagnostic plots : Histogram and QQ plot
Standardize the data and plot a histogram
mydata.st<-scale(mydata.dat)
hist(mydata.st, main="histgram", xlab="X values")
#qq plot
## pch =16 (16 is a symbol for a filled circle)
qqnorm(mydata.st, main="QQ plot", pch=16, col="navy")
Chi-squre plot
==========
Output multiple plots in one screen (page)
## c(2,3) determines no of rows and columns
## no of row = 2
## no of columns = 3
par(mfrow=c(2,3))
Parameters for graphs
Pch : plotting character, i.e., symbol to use
there are 18 symbols.
============
Random variable generator in R
# Standard normal
# n: number of values you want to generate
rnorm(n)
# Chi-square
# n: no of values, df: degrees of freedom
rchisq(n, df)
# Cauchy
# n: no of values
rcauchy(n)
Create a Matrix in R
yes no maybe
apple 1 4 7
orange 2 5 8
banana 3 6 9
Evac <- matrix(c(1,2,3,4,5,6,7,8,9), 3, 3, dimnames=list(fruit=c("apple", "orange", "banana"), answer=c("yes", "no", "maybe")))
Perform Fishers Exact Test in R
fisher.test(Evac)
Manipulating data frame and data
When reading a large set of data, it is better to scan than loading the whole data set.
grep
string function that returns indices of your interest
#print working directory path
getwd()
#set working directory path
setwd("C://...")
# installing packages
install.packages(package_name)
# print files and dir in the working dir
list.files()
# Lower and Uppercase
toupper # to uppercase
tolower # to lowercase

To eliminate rows with condition
# eliminate rows that Age is empty
dat<-dat[-which(temp$Age==""),]
R Error: FEXACT error 7
Testing with small sample size, it is more preferable to use the Fisher’s Exact test than the Chi-square test.
fisher.test(counts, simulate.p.value=TRUE)
If you have too many rows or columns, you may get an error saying,
FEXACT error 7.
LDSTP is too small for this problem.
Try increasing the size of the workspace.
You can still do the test by adding “simulate.p.value=TRUE”
fisher.test(counts, simulate.p.value=TRUE)
How to create contingency table from categorical data in r.
Example:
There are three categorical variables x1, x2, x3 measured from wild cats where
x1 = gender (male, female)
x2 = age (young, kitten, adult)
x3 = test result ( positive = 1, negative =0).
r table will generate two tables: 2by2 table for each of x3=0 and x3=1.
# r code
table(x1, x2, x3)
As shown below, the R output has two parts when x3=0 and x3=1.
Row represents Gender (x1) and the column represents Age (x2).
The numbers are counts of cats that fall into the corresponding categories.
, , = 0
A K Y
F 14 84 2
M 8 97 2
, , = 1
A K Y
F 1 12 0
M 1 36 0
T values
t value = qt(alpha/2, n-1)
#example
> qt(0.975, 8 )
[1] 2.306004