News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA

Enter the characters shown in the image.

User menu

You are here

Transfer and Multi-task Learning: Statistical Insights for Modern Data Challenges

Thursday, February 13, 2025 - 10:30 to 11:30
Ye Tian, Ph.D. student, Department of Statistics, Columbia University
Statistics Seminar
ESB 4192 / Zoom

To join this seminar virtually: Please request Zoom connection details from ea [at] stat.ubc.ca

Abstract: Knowledge transfer, a core human ability, has inspired numerous data integration methods in machine learning and statistics. However, data integration faces significant challenges: (1) unknown similarity between data sources; (2) data contamination; (3) high-dimensionality; and (4) privacy constraints. This talk addresses these challenges in three parts across different contexts, presenting both innovative statistical methodologies and theoretical insights.

In Part I, I will introduce a transfer learning framework for high-dimensional generalized linear models that combines a pre-trained Lasso with a fine-tuning step. We provide theoretical guarantees for both estimation and inference, and apply the methods to predict county-level outcomes of the 2020 U.S. presidential election, uncovering valuable insights.

In Part II, I will explore an unsupervised learning setting where task-specific data is generated from a mixture model with heterogeneous mixture proportions. This complements the supervised learning setting discussed in Part I, addressing scenarios where labeled data is unavailable. We propose a federated gradient EM algorithm that is communication-efficient and privacy-preserving, providing estimation error bounds for the mixture model parameters.

In Part III, I will introduce a representation-based multi-task learning framework that generalizes the distance-based similarity notion discussed in Parts I and II. This framework is closely related to modern applications of fine-tuning in image classification and natural language processing. I will discuss how this study enhances our understanding of the effectiveness of fine-tuning and the influence of data contamination on representation multi-task learning.

Finally, I will summarize the talk and briefly introduce my broader research interests. The three main sections of this talk are based on a series of papers [TF23, TWXF22, TWF24, TGF23] and a short course I co-taught at NESS 2024 [STL24]. More about me and my research can be found at https://yet123.com.  

[TF23] Tian, Y., & Feng, Y. (2023). Transfer Learning under High-dimensional Generalized Linear Models. Journal of the American Statistical Association, 118(544), 2684-2697. [Link]

[TWXF22] Tian, Y., Weng, H., Xia, L., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224. [Link]

[TWF24] Tian, Y., Weng, H., & Feng, Y. (2024). Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms. ICML 2024. [Link]

[TGF23] Tian, Y., Gu, Y., & Feng, Y. (2023). Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness. arXiv preprint arXiv:2303.17765. [Link]

[STL24] A (Selective) Introduction to the Statistics Foundations of Transfer Learning. (2024). [Link]