Tuesday, December 6, 2016 - 11:00 to 12:00
Andy Leung, UBC Statistics PhD Student
Room 4192, Earth Sciences Building (2207 Main Mall)
In traditional robust statistics, it is generally assumed that the majority of the observations in the data are free of contamination, while only a minority of the observations are contaminated. The contaminated observations are flagged as outliers and down-weighted even if only a single component is contaminated. In practice, observations can be entirely contaminated. This situation usually refers to casewise contamination. However, observations can also be only partially contaminated. This type of contamination often appears as single outlying cells in a data matrix and therefore, usually refers to cellwise contamination. Under cellwise contamination, a lot of information could be lost through downweighting the whole observation, especially for high-dimensional data. Furthermore, recent work has shown that traditional robust procedures that proceed in such way may fail when applied to such datasets. In this talk, I will present this problem using a real data example and sketch out our proposal when the goal is to estimate multivariate location and scatter matrix under simultaneous cellwise and casewise contamination.