k nearest neighbors classification (knn)

Posted by admin on Friday Jul 16, 2010 Under Statistics

Nonparametric classification method
Idea behind knn is that you measure distance between new value (x0) and each of the neighboring points and count the first k shortest distances, then classify the new value to the group that wins the majority rule.

Steps:
1. Choose k as an odd integer
2. Measure the distance between xo and each of the training data points
3. Order the distance
4. Select first k distances
5. Assign the new value to the group that wins the majority rule

R Code for KNN

# importing data into a dataframe
iris.dat <- read.table("iris_short.txt", header=T)

# plot the data
# first 50 data are Setosa and the second 50 Versicolor
plot(iris.dat[1:50,2:3], xlim=c(4.3, 7.0),ylim=c(1.0, 5.1), pch=1, col="red", main="Iris Data")
points(iris.dat[51:100, 2:3], pch=1, col="darkgreen")

# legend (#xcoord, #ycoord, col=c("colors"), pch=c("shape of points"), text.col=c("textcolor")
# legend=c("actual text you want to use")
legend(4.5, 4.9, col=c("red", "darkgreen"), pch=c(1,1), text.col=c("red", "darkgreen"), legend=c("Setosa", "Versicolor"))

# library class is required for knn
library(class)

# generating 100 data points in the range of (4.3, 7.0) and (1.0, 5.1)
x1<-seq(4.3, 7.0, len=100)
x2<-seq(1.0, 5.1, len=100)
x1.new<-rep(x1, 100)
x2.new<-rep(x2, rep(100,100))
iris.knn.2<-knn(iris.dat[,2:3], cl=iris.dat$species, test=cbind(x1.new, x2.new), k=2)

## plotting k nearest neighbors
# pt.col is color that will be assigned to each point based on what knn classifies the data as
# iris.knn.2 will have either 1 or 2.
# if iris.knn.2 >1, then take darkgreen, red o.w
pt.col<- ifelse(c(iris.knn.2) > 1, "darkgreen", "red")

# drawing points on the plot
points(x1.new, x2.new, col=pt.col, pch=20, cex=.1)

# drawing contour
contour(x1,x2,matrix(iris.knn.2,nc=100),add=T,nlevel=1,lty=1,drawlabel=F)

Leave a Reply