News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

User menu

You are here

Hierarchical clustering of observations and features for high-dimensional data

Tuesday, November 29, 2016 - 11:00 to 12:00
Hongyang (Fred) Zhang, PhD Student, UBC Statistics
Statistics Seminar
Room 4192, Earth Sciences Building (2207 Main Mall)

In this talk, we present new developments of hierarchical clustering in high-dimensional data. We consider clustering both the observations and the features. We first focus on the clustering of observations. In high-dimensional data, the existence of potential noise features and outliers poses unique challenges to the existing hierarchical clustering techniques. We propose the robust sparse hierarchical clustering (RSHC) and the multi-rank sparse hierarchical clustering (MrSHC) to address these challenges. We then consider clustering of features in high-dimensional data. We propose a new hierarchical clustering technique to divide the large number of features into subgroups called regression phalanxes. The regression phalanxes are used for building base regression models for further ensembling. We show that the ensemble of regression phalanxes resulting from the hierarchical clustering produces further gains in prediction accuracy when applied to an effective method like Lasso or Random Forests.