Abstract | Background: Single nucleotide polymorphisms ( SNPs) are DNA sequence variations, occurring when a single nucleotide - adenine ( A), thymine ( T), cytosine ( C) or guanine ( G) - is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension ( APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger- based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system ( known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. Results: Using a set of 32 Coriell DNA samples plus three negative PCR controls as a training data set, we have developed a fully- automated genotyping algorithm based on simple linear discriminant analysis ( LDA) using dynamic variable selection. The algorithm combines separate analyses based on the multiple probe sets to give a final posterior probability for each candidate genotype. We have tested our algorithm on a completely independent data set of 270 DNA samples, with validated genotypes, from patients admitted to the intensive care unit (ICU) of St. Paul's Hospital ( plus one negative PCR control sample). Our method achieves a concordance rate of 98.9% with a 99.6% call rate for a set of 96 SNPs. By adjusting the threshold value for the final posterior probability of the called genotype, the call rate reduces to 94.9% with a higher concordance rate of 99.6%. We also reversed the two independent data sets in their training and testing roles, achieving a concordance rate up to 99.8%. Conclusion: The strength of this APEX chemistry- based platform is its unique redundancy having multiple probes for a single SNP. Our model- based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down- weighting any `bad data' corresponding to image artifacts on the microarray slide or failure of a specific chemistry. In this regard, our method is able to automatically select the probes which work well and reduce the effect of other so- called bad performing probes in a sample- specific manner, for any number of SNPs. |