In statistical applications, we are often asked to construct a classifier based on a random sample from a specific population. Once a classifier is built, we may use it to categorize new individuals from the population. The accuracy of categorizing new individuals is related to the precision of the classifier we built. Yet, the sample from the population is generally noisy. Unless the sample size is very large, the performance of the classifier in terms of correctly classifying new individuals is far from certain. In the data analysis stage, we usually look for the classifier that provides the highest success rate in classifying individuals in the given sample. This classifier's apparent rate of success generally over-estimates its precision when it is applied on new individuals from the population. To overcome this issue, the cross-validation technique is often suggested to be used to assess the performance of a classifier. In this project, we use simulation studies to investigate if the cross-validation technique indeed accurately estimates the performance of classifiers in various situations.
Assessing performance of classifiers by cross-validation based on binary data
Thursday, August 25, 2016 - 16:00
Yichen Zhao, UBC Statistics Master's student
Room 4192, Earth Sciences Building (2207 Main Mall)