Archive for July, 2010

k nearest neighbors classification (knn)

Nonparametric classification method Idea behind knn is that you measure distance between new value (x0) and each of the neighboring points and count the first k shortest distances, then classify the new value to the group that wins the majority rule. Steps: 1. Choose k as an odd integer 2. Measure the distance between xo […]

Read More →

Factor Analysis (FA)

Preparation and EDA Data should be standardized in factor analysis scale(crime.dat) #standardize data crime.dat.sd= scale(crime.dat) To obtain number of factors to use for the factor analysis, PCA can be used #PCA for EDA crime.pca<-princomp(crime.dat.sd) Bartlett scores crime.fa.s

Read More →

Principle Component Analysis (PCA)

Performing a PCA after standardizing the variables and obtain estimates for the principal components for the standardized variables. Reading in athelete’s data ath.dat <- read.table(“athelete.txt”) Standardizing the data ath.dat.std <- scale(ath.dat) Correlation matrix (since covariance of standardized data is correlation) R = cov(ath.dat.std) Eigen Values lambda = eigen(R)$val Eigen values are read to assess which […]

Read More →