How to plot clusters of kmeans in R and show centroids? -
i have dataset has 6497 instance, 12 attributes, , class variable called q (quality). class values can range 3 9. data can downloaded in csv format from here
i doing k-means cluster on dataset , plot it. there seems wrong plots i'm generating because don't think representing clusters. plot i'm trying generate referred answer how create cluster plot in r?
here i'm doing
library(vegan) winequality <- read.csv("wine_nocolor.csv") express <- winequality[, c("fa", "va", "ca", "rs", "ch", "fsd", "tsd", "d", "p", "s", "a")] rownames(express) <- winequality$id str(express) #'data.frame': 6497 obs. of 11 variables kclus <- kmeans(express,centers= 3, iter.max=1000, nstart=10000) #takes bit of time wine_dist <- dist(express) cmd <- cmdscale(wine_dist) #takes bit of time groups <- levels(factor(kclus$cluster)) ordiplot(cmd, type = "n") #shows warning species scores not available cols <- c("steelblue", "darkred", "darkgreen") for(i in seq_along(groups)){ points(cmd[factor(kclus$cluster) == groups[i], ], col = cols[i], pch = 16) } # add spider , hull ordispider(cmd, factor(kclus$cluster), label = true) ordihull(cmd, factor(kclus$cluster), lty = "dotted")
the above code produces following plot. can see, clusters aren't demonstrated in clear fashion.
questions
- what dim1 , dim2?
- how can fix this?
- additionally, r offer way produce plot similar plot generated scikit showing clusters , centroids?
the author of code (from other question) using dimension reduction using mds (multi dimensional scaling) plot cluster.
read ?cmdscale
understand.
whether want dimension reduction, , before or after clustering, choice, not sure there "to fix" in code, more decide want , plot. suggest try first reduce number of variables before clustering. 11 lot. useful?
also remember variables need normalized before applying k-means.
Comments
Post a Comment