clues: An R Package for Nonparametric Clustering Based on Local Shrinking

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

You are here

clues: An R Package for Nonparametric Clustering Based on Local Shrinking

Titleclues: An R Package for Nonparametric Clustering Based on Local Shrinking
Publication TypeJournal Article
Year of Publication2010
AuthorsChang, F, Qiu, W, Zamar, RH, Lazarus, R, Wang, X
JournalJOURNAL OF STATISTICAL SOFTWARE
Volume33
Pagination1-16
Date PublishedFEB
Type of ArticleArticle
ISSN1548-7660
Keywordsagreement index, cluster analysis, dissimilarity measure, K-nearest neighbor
AbstractDetermining the optimal number of clusters appears to be a persistant and controversial issue in cluster analysis. Most existing R packages targeting clustering require the user to specify the number of clusters in advance. However, if this subjectively chosen number is far from optimal, clustering may produce seriously misleading results. In order to address this vexing problem, we develop the R package clues to automate and evaluate the selection of an optimal number of clusers, which is widely applicable in the field of clustering analysis. Package clues uses two main procedures, shrinking and partitioning, to estimate an optimal number of clusters by maximizing an index function, either the CH index or the Silhouette index, rather than relying on guessing a pre-specified number. Five agreement indices (Rand index, Hubert and Arabie's adjusted Rand index, Morey and Agresti's adjusted Rand index, Fowlkes and Mallows index and Jaccard index), which measure the degree of agreement between any two partitions, are also provided in clues. In addition to numerical evidence, clues also supplies a deeper insight into the partitioning process with trajectory plots.