News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA

Enter the characters shown in the image.

User menu

You are here

Bayesian Models for Hierarchical Clustering of Network Data

Thursday, May 4, 2023 - 11:00 to 12:00
Creagh Briercliffe, UBC Statistics PhD Student
ESB 4192 / Zoom

To Join this seminar virtually: Please request Zoom connection details from headsec [at] stat.ubc.ca.

Abstract: Network data exist in many forms, like social networks, or interactions between cell proteins. Generally, they represent relational information between interacting entities. In many real-world examples, these entities tend to exhibit grouping structure. For example, the highly connected communities of people within a social network. Uncovering the underlying structure in networks is an important task for studying their composition and behaviour. Hierarchical clustering is a technique for discovering this structure across multiple scales, where a dendrogram represents the full hierarchy of clusters. This talk will explore Bayesian models for hierarchical clustering of network data, which aim to infer the posterior distribution over dendrograms.

The “Hierarchical Random Graph” is likely the most popular Bayesian approach to hierarchical clustering of network data. Yet, due to simplifications made in its inference scheme, we identify some potentially undesirable model behaviour. To rectify these issues, we introduce a general class of models that are characterized by a sampling construction, defining a generative process for simple graphs. We propose four Bayesian models from this class, and derive the marginalized posterior distribution over dendrograms, to isolate the problem of inferring a hierarchical clustering. We implement these models in a probabilistic programming language (Blang) that leverages state-of-the-art approximate inference methods (non-reversible Parallel Tempering). Finally, the empirical performance of our models is demonstrated on examples of real network data.