Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
I think the Shapiro-Wilk test is a great way to see if a variable is normally distributed. This is an important assumption in creating any sort of model and also evaluating models.
Let’s look at how to do this in R!
And here is the output:
Shapiro-Wilk normality test data: data$CreditScore W = 0.96945, p-value = 0.2198
So how do we read this? It looks like the p-value is too high. But it is not. The data is normal if the p-value is above 0.05. So we now know our variable is normally distributed.
Let’s make a histogram to take a look using base R graphics:
hist(data$CreditScore, main="Credit Score", xlab="Credit Score", border="light blue", col="blue", las=1, breaks=5)
It does look normal from our distribution here:
Great! Now we can make assumptions and perform more tests on our credit scores.