<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SOISOS</title>
	<atom:link href="http://www.soisos.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.soisos.com</link>
	<description>The Essential Ingredient I cannot live without ...</description>
	<lastBuildDate>Fri, 27 Aug 2010 20:07:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>calculus for statistics</title>
		<link>http://www.soisos.com/archives/125</link>
		<comments>http://www.soisos.com/archives/125#comments</comments>
		<pubDate>Fri, 27 Aug 2010 20:01:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=125</guid>
		<description><![CDATA[]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.soisos.com/wp-content/uploads/2010/08/midterm_eq4.gif" alt="" title="midterm_eq" width="159" height="68" class="alignnone size-full wp-image-134" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/125/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Manipulating data frame/table</title>
		<link>http://www.soisos.com/archives/114</link>
		<comments>http://www.soisos.com/archives/114#comments</comments>
		<pubDate>Fri, 20 Aug 2010 20:14:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[eliminate]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[r code]]></category>
		<category><![CDATA[stat]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=114</guid>
		<description><![CDATA[To eliminate rows with condition # eliminate rows that Age is empty dat]]></description>
			<content:encoded><![CDATA[<p>To eliminate rows with condition<br />
<code><br />
# eliminate rows that Age is empty<br />
 dat<-dat[-which(temp$Age==""),]</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/114/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>r error:  FEXACT error 7</title>
		<link>http://www.soisos.com/archives/107</link>
		<comments>http://www.soisos.com/archives/107#comments</comments>
		<pubDate>Fri, 20 Aug 2010 19:31:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[chi-square]]></category>
		<category><![CDATA[contingency]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[FEXACT error 7]]></category>
		<category><![CDATA[fisher]]></category>
		<category><![CDATA[fisher's exact test]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[too small]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=107</guid>
		<description><![CDATA[R Error: FEXACT error 7 Testing with small sample size, it is more preferable to use the Fisher&#8217;s Exact test than the Chi-square test. fisher.test(counts, simulate.p.value=TRUE) If you have too many rows or columns, you may get an error saying, FEXACT error 7. LDSTP is too small for this problem. Try increasing the size of [...]]]></description>
			<content:encoded><![CDATA[<p>R Error: FEXACT error 7</p>
<p>Testing with small sample size, it is more preferable to use the Fisher&#8217;s Exact test than the Chi-square test.<br />
<code>fisher.test(counts, simulate.p.value=TRUE)</code></p>
<p>If you have too many rows or columns, you may get an error saying,<br />
<code>FEXACT error 7.<br />
LDSTP is too small for this problem.<br />
Try increasing the size of the workspace.</code></p>
<p>You can still do the test by adding &#8220;simulate.p.value=TRUE&#8221;<br />
<code>fisher.test(counts, simulate.p.value=TRUE)</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/107/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Contingency Table for Categorical data and R</title>
		<link>http://www.soisos.com/archives/101</link>
		<comments>http://www.soisos.com/archives/101#comments</comments>
		<pubDate>Wed, 11 Aug 2010 02:03:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[anova]]></category>
		<category><![CDATA[categorical data]]></category>
		<category><![CDATA[contingency]]></category>
		<category><![CDATA[contingency table]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[r code]]></category>
		<category><![CDATA[stat]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[table]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=101</guid>
		<description><![CDATA[How to create contingency table from categorical data in r. Example: There are three categorical variables x1, x2, x3 measured from wild cats where x1 = gender (male, female) x2 = age (young, kitten, adult) x3 = test result ( positive = 1, negative =0). r table will generate two tables: 2by2 table for each [...]]]></description>
			<content:encoded><![CDATA[<p>How to create contingency table from categorical data in r.</p>
<p>Example:<br />
There are three categorical variables x1, x2, x3 measured from wild cats where<br />
x1 = gender (male, female)<br />
x2 = age (young, kitten, adult)<br />
x3 = test result ( positive = 1, negative =0).</p>
<p>r table will generate two tables:   2by2 table for each of  x3=0 and x3=1.</p>
<p><code># r code<br />
table(x1, x2, x3)</code></p>
<p>As shown below, the R output has two parts when x3=0 and x3=1.<br />
Row represents Gender (x1) and the column represents Age (x2).<br />
The numbers are counts of cats that fall into the corresponding categories.</p>
<p><code>, ,  = 0</code></p>
<p><code><br />
        A   K  Y<br />
  F   14  84  2<br />
  M  8    97  2</code></p>
<p><code>, ,  = 1</code></p>
<p><code><br />
        A  K  Y<br />
  F   1  12  0<br />
  M  1  36  0 </code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/101/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>T value, F value, Z value in R</title>
		<link>http://www.soisos.com/archives/70</link>
		<comments>http://www.soisos.com/archives/70#comments</comments>
		<pubDate>Tue, 03 Aug 2010 20:20:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=70</guid>
		<description><![CDATA[T values t value = qt(alpha/2, n-1) #example > qt(0.975, 8 ) [1] 2.306004]]></description>
			<content:encoded><![CDATA[<p>T values </p>
<p><code>t value = qt(alpha/2, n-1) </code></p>
<p><code>#example </code><br />
<code>> qt(0.975, 8 )</code><br />
<code>[1] 2.306004</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/70/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Validating assumption of multivariate normal data</title>
		<link>http://www.soisos.com/archives/63</link>
		<comments>http://www.soisos.com/archives/63#comments</comments>
		<pubDate>Mon, 02 Aug 2010 23:09:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[chi-square]]></category>
		<category><![CDATA[hist]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[normal]]></category>
		<category><![CDATA[normality]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[qq plot]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[univariate]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=63</guid>
		<description><![CDATA[Univariate and Multivariate diagnostics Univariate diagnostic (Histogram and QQ plot) Plot a histogram hist(mydata.st, main="histgram", xlab="X values") Plot QQ plot ## pch =16 (16 is a symbol for a filled circle) qqnorm(mydata.st, main="QQ plot", pch=16, col="navy") Multivariate dignostics Chi-squre plot We will graph distance vs chsq # function to compute distance between X and X.bar [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Univariate and Multivariate diagnostics</strong></p>
<p><em>Univariate diagnostic (Histogram and QQ plot)<br />
</em><br />
Plot a histogram<br />
<code><br />
hist(mydata.st, main="histgram", xlab="X values")<br />
</code></p>
<p>Plot QQ plot</p>
<p><code>## pch =16  (16 is a symbol for a filled circle)<br />
qqnorm(mydata.st, main="QQ plot", pch=16, col="navy")</code></p>
<p><em><br />
Multivariate dignostics</em></p>
<p>Chi-squre plot<br />
We will graph distance vs chsq </p>
<p><code># function to compute distance between X and X.bar<br />
# argument is X, X.bar and S.inv<br />
f.dist <- function(x,x.bar,S.inv){<br />
	return(t(x-x.bar)%*%S.inv%*%(x-x.bar))<br />
	}</code></p>
<p><code>dist_<-apply(mydat.dat, 1, f.dist, x.bar=apply(mydat.dat, 2, mean), S.inv= solve(cov(mydat.dat)))<br />
</code></p>
<p><code># Compute u's from chi-square<br />
u.cs <- qchisq((1:150-.5)/150, 4)</code></p>
<p>Make a plot<br />
<code>plot(u.cs, sort(dist_), pch=16, col="navy", xlab="Theoretical Quantiles", ylab="Sample Quantiles", main="Chi-Square Plot")<br />
</code></p>
<p>If the chi-square plot has a line with a slope=1 and intercept=0, then the data can be assumed to be multivariate normal</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/63/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>k nearest neighbors classification (knn)</title>
		<link>http://www.soisos.com/archives/49</link>
		<comments>http://www.soisos.com/archives/49#comments</comments>
		<pubDate>Fri, 16 Jul 2010 10:17:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[knn]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[nonparametric]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[r code]]></category>
		<category><![CDATA[stat]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=49</guid>
		<description><![CDATA[Nonparametric classification method Idea behind knn is that you measure distance between new value (x0) and each of the neighboring points and count the first k shortest distances, then classify the new value to the group that wins the majority rule. Steps: 1. Choose k as an odd integer 2. Measure the distance between xo [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Nonparametric classification method</strong><br />
Idea behind knn is that you measure distance between new value (x0) and each of the neighboring points and count the first k shortest distances, then classify the new value to the group that wins the majority rule.</p>
<p><strong>Steps:</strong><br />
1.  Choose k as an odd integer<br />
2.  Measure the distance between xo and each of the training data points<br />
3.  Order the distance<br />
4.  Select first k distances<br />
5.  Assign the new value to the group that wins the majority rule</p>
<p><strong>R Code for KNN</strong></p>
<p><code># importing data into a dataframe<br />
iris.dat <- read.table("iris_short.txt", header=T)</p>
<p># plot the data<br />
# first 50 data are Setosa and the second 50 Versicolor<br />
plot(iris.dat[1:50,2:3], xlim=c(4.3, 7.0),ylim=c(1.0, 5.1), pch=1, col="red", main="Iris Data")<br />
points(iris.dat[51:100, 2:3], pch=1, col="darkgreen")</p>
<p># legend (#xcoord, #ycoord, col=c("colors"), pch=c("shape of points"), text.col=c("textcolor")<br />
# legend=c("actual text you want to use")<br />
legend(4.5, 4.9, col=c("red", "darkgreen"), pch=c(1,1), text.col=c("red", "darkgreen"), legend=c("Setosa", "Versicolor"))</code></p>
<p><code># library class is required for knn<br />
library(class)</p>
<p># generating 100 data points in the range of (4.3, 7.0) and (1.0, 5.1)<br />
x1<-seq(4.3, 7.0, len=100)<br />
x2<-seq(1.0, 5.1, len=100)<br />
x1.new<-rep(x1, 100)<br />
x2.new<-rep(x2, rep(100,100))<br />
iris.knn.2<-knn(iris.dat[,2:3], cl=iris.dat$species, test=cbind(x1.new, x2.new), k=2)</code></p>
<p><code>## plotting k nearest neighbors<br />
# pt.col is color that will be assigned to each point based on what knn classifies the data as<br />
# iris.knn.2 will have either 1 or 2.<br />
# if iris.knn.2 >1, then take darkgreen, red o.w<br />
pt.col<- ifelse(c(iris.knn.2) > 1, "darkgreen", "red")</p>
<p># drawing points on the plot<br />
points(x1.new, x2.new, col=pt.col, pch=20, cex=.1)</p>
<p># drawing contour<br />
contour(x1,x2,matrix(iris.knn.2,nc=100),add=T,nlevel=1,lty=1,drawlabel=F)<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/49/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Factor Analysis (FA)</title>
		<link>http://www.soisos.com/archives/31</link>
		<comments>http://www.soisos.com/archives/31#comments</comments>
		<pubDate>Tue, 06 Jul 2010 04:56:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[fa]]></category>
		<category><![CDATA[factor analysis]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[pca]]></category>
		<category><![CDATA[r code]]></category>
		<category><![CDATA[stat]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=31</guid>
		<description><![CDATA[Preparation and EDA Data should be standardized in factor analysis scale(crime.dat) #standardize data crime.dat.sd= scale(crime.dat) To obtain number of factors to use for the factor analysis, PCA can be used #PCA for EDA crime.pca&#60;-princomp(crime.dat.sd) Bartlett scores crime.fa.s]]></description>
			<content:encoded><![CDATA[<p><span style="color: #ffff00;">Preparation and EDA</span><br />
Data should be standardized in factor analysis<br />
<code>scale(crime.dat)</code></p>
<p><code>#standardize data<br />
crime.dat.sd= scale(crime.dat)</code></p>
<p>To obtain number of factors to use for the factor analysis, PCA can be used<br />
<code>#PCA for EDA<br />
crime.pca&lt;-princomp(crime.dat.sd)</code></p>
<p>Bartlett scores<br />
<code>crime.fa.s<- factanal(crime.dat.sd, 3, scores="Bartlett", rotation="varimax")<br />
crime.fa.s$scores</code></p>
<p><span style="color: #ffff00;">Factor Analysis</span></p>
<p><code>#arg(standarized data, no of factors, rotation)<br />
factanal(crime.da.sd, 3, rotation="none"</code></p>
<p><span style="color: #ffff00;">Hypothesis Testing</span><br />
H<sub>0</sub>:  The number of factor is sufficient<br />
H<sub>a</sub>:  The number of Factor is not sufficient</p>
<p>Decision Rule<br />
Reject H<sub>0</sub> if test statistic &gt; Chiq(alpha, df) or if p-value is small<br />
[Note: the test statistic and p-value can be obtained from R output of factoranal]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/31/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Principle Component Analysis (PCA)</title>
		<link>http://www.soisos.com/archives/16</link>
		<comments>http://www.soisos.com/archives/16#comments</comments>
		<pubDate>Sun, 04 Jul 2010 21:42:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Numbers]]></category>
		<category><![CDATA[data reduction]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[multivariate analysis]]></category>
		<category><![CDATA[pca]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[r code]]></category>
		<category><![CDATA[stat]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.soisos.com/?p=16</guid>
		<description><![CDATA[Performing a PCA after standardizing the variables and obtain estimates for the principal components for the standardized variables. Reading in athelete&#8217;s data ath.dat &#60;- read.table("athelete.txt") Standardizing the data ath.dat.std &#60;- scale(ath.dat) Correlation matrix (since covariance of standardized data is correlation) R = cov(ath.dat.std) Eigen Values lambda = eigen(R)$val Eigen values are read to assess which [...]]]></description>
			<content:encoded><![CDATA[<p><br/><br />
Performing a PCA after standardizing the variables and obtain estimates for the principal components for the standardized variables.</p>
<p>Reading in athelete&#8217;s data<br />
<code>ath.dat &lt;- read.table("athelete.txt")</code></p>
<p>Standardizing the data<br />
<code>ath.dat.std &lt;- scale(ath.dat)</code></p>
<p>Correlation matrix (since covariance of standardized data is correlation)<br />
<code>R = cov(ath.dat.std)</code></p>
<p>Eigen Values<br />
<code>lambda = eigen(R)$val</code></p>
<p>Eigen values are read to assess which components explain the variance the most. In this case, the first two show values above 1 therefore we will take the first two components.<br />
<code><br />
[1] 1.0900625 1.0290211 0.8809163</code></p>
<p>Eigen Vector<br />
<code>es= eigen(R)$vec</code></p>
<p><code> [,1]       [,2]       [,3]<br />
[1,]  0.7476122 -0.1215134 -0.6529246<br />
[2,] -0.2827011  0.8313790 -0.4784235<br />
[3,]  0.6009626  0.5422577  0.5871971</code></p>
<p>Three eigen vectors associated with the labmda values. The three vectors are eigen vectors of the covariance matrix and also the loadings of Principle components.</p>
<p>You can find the loadings (eigen vectors) by obtaining loadings of the princomp output</p>
<p><code>loadings(ath.pca)</code></p>
<p><code>Loadings:<br />
   Comp.1 Comp.2 Comp.3<br />
X1  0.748 -0.122 -0.653<br />
X2 -0.283  0.831 -0.478<br />
X3  0.601  0.542  0.587</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.soisos.com/archives/16/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
