In this talk, we present new developments of hierarchical clustering in high-dimensional data. We consider clustering both the observations and the features. We first focus on the clustering of observations. In high-dimensional data, the existence of potential noise features and outliers poses unique challenges to the existing hierarchical clustering techniques. We propose the robust sparse hierarchical clustering (RSHC) and the multi-rank sparse hierarchical clustering (MrSHC) to address these challenges. We then consider clustering of features in high-dimensional data. We propose a new hierarchical clustering technique to divide the large number of features into subgroups called regression phalanxes. The regression phalanxes are used for building base regression models for further ensembling. We show that the ensemble of regression phalanxes resulting from the hierarchical clustering produces further gains in prediction accuracy when applied to an effective method like Lasso or Random Forests.
Hierarchical clustering of observations and features for high-dimensional data
Tuesday, November 29, 2016 - 11:00 to 12:00
Hongyang (Fred) Zhang, PhD Student, UBC Statistics
Room 4192, Earth Sciences Building (2207 Main Mall)