Canadian health administrative databases are a popular and rich data source for epidemiological research at the population level, but error-prone diagnostic information typically provide investigators with a less-than-perfect proxy for the disease under study. Motivated by an observational study into the prodromal phase of multiple sclerosis, this talk considers the setting of a matched exposure-disease association study where the disease variable is measured with error, and participant selection is based upon the error-prone disease label rather than the true disease status. We initially focus on the special case of a pair-matched case-control study. Assuming non-differential misclassification of study participants, we give a closed-form expression for asymptotic biases in odds ratios arising under naive analyses of misclassified data, and propose a Bayesian model to correct association estimates for misclassification bias. For identifiability, the model relies on information from a validation cohort of correctly classified case-control pairs, and also requires prior knowledge about the predictive values of the classifier. In a simulation study, the model shows improved point and interval estimates relative to the naive analysis, but is also found to be overly restrictive for our motivating dataset. In light of these concerns, we further propose a generalized model for misclassified data that extends to the case of differential misclassification and allows for a variable number of controls per matching stratum. Instead of prior information about the classification process, the model relies on individual-level estimates of each participant’s true disease status, which were obtained from a counting process mixture model of MS-specific healthcare utilization in our motivating example. Both methods are applied to real study data to investigate the symptoms proceeding the first recognized sign of multiple sclerosis.