Validating assumption of multivariate normal data
Posted by admin on Monday Aug 2, 2010 Under StatisticsUnivariate and Multivariate diagnostics
Univariate diagnostic (Histogram and QQ plot)
Plot a histogram
hist(mydata.st, main="histgram", xlab="X values")
Plot QQ plot
## pch =16 (16 is a symbol for a filled circle)
qqnorm(mydata.st, main="QQ plot", pch=16, col="navy")
Multivariate dignostics
Chi-squre plot
We will graph distance vs chsq
# function to compute distance between X and X.bar
# argument is X, X.bar and S.inv
f.dist <- function(x,x.bar,S.inv){
return(t(x-x.bar)%*%S.inv%*%(x-x.bar))
}
dist_<-apply(mydat.dat, 1, f.dist, x.bar=apply(mydat.dat, 2, mean), S.inv= solve(cov(mydat.dat)))
# Compute u's from chi-square
u.cs <- qchisq((1:150-.5)/150, 4)
Make a plot
plot(u.cs, sort(dist_), pch=16, col="navy", xlab="Theoretical Quantiles", ylab="Sample Quantiles", main="Chi-Square Plot")
If the chi-square plot has a line with a slope=1 and intercept=0, then the data can be assumed to be multivariate normal
August 30th, 2010 at 6:27 AM
Hi
I was actually searching with the keywords “validating an assumption on data” or “…. about data”… and stumbled across this blog entry. Was wondering if you can help with something, since I see you are quite into statistics
Let’s say I have an assumption about my data… how do I validate that assumption?
It goes like this…
In soccer videos (broadcasts or even live telecasts)… interesting events occur rarely… like goals, yellow cards etc.
These events normally cause crowd and commentator reaction, which would increase the audio measurements…
However, in normal circumstances (most of the time)… the commentators and crowds are rather neutral. Therefore… if I take (say) 100 or 200 samples of of audio measurements of the commentator’s and crowds’ excitements… most of the measurements will be biased towards that of the neutral segments…
So the assumption is just that… (or is it the Hypothesis?)
Assumption: If audio samples are taken from any soccer video, the measurements will be more biased towards that of non-event audio measurements
Actually, I dont know the first thing about hypothesis testing, or validating assumptions like this. Was just wondering if anyone can point me in the right direction
Thanks a mil for reading. Cheers!
Alf
August 30th, 2010 at 7:19 AM
Hello Alf,
Well, the validating assumption in the context of my entry was for normality of data. Your data may not come from normal distribution since there are rare occasions that have extreme values, but if it happens so rarely like 1 out of 200 or 300 times, it may not affect the average of the data if the data set is large. So then you can try to see if the data are normal by applying the techniques I’ve mentioned above.
If you conclude that it is normally distributed, and if you classify the distribution of data as “non-event audio measurements”, then yes your assumption (well hypothesis may be a better term) can be valid.
or you can try to compute the probability of the rare event to explain the event occurrence or try to come up with distribution of your data.
Hypothesis testing can be anything. You can test if the true average of data equals what you think it is, or if an average of a dataset A equals that of dataset B. It can be anything, but most of the testings you’ve seen in stat courses (maybe undergrad) require the assumption of normality because the process of testing involves a test statistic that is derived from normal distribution. If your data are not normal and try to use those tests, the result is not valid. That’s why you want to check for the normality of data before using any test. Of course there are nonparametric methods of testings as well which do not require any assumption of data.
You may also try to classify the measurements by “non-event”, “event” by graphing. You may be some clusters. Once you identify the clusters, you can take the sample and compare it with the classification and see where the sample falls into.