A Note on Probabilistic Models over Strings: The Linear Algebra Approach

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

You are here

A Note on Probabilistic Models over Strings: The Linear Algebra Approach

TitleA Note on Probabilistic Models over Strings: The Linear Algebra Approach
Publication TypeJournal Article
Year of Publication2013
AuthorsBouchard-Cote, A
JournalBULLETIN OF MATHEMATICAL BIOLOGY
Volume75
Pagination2529-2550
Date PublishedDEC
Type of ArticleArticle
ISSN0092-8240
KeywordsAlignment, Automata, Factor graphs, Graphical models, Indel, Phylogenetics, Probabilistic models, String transducers, TKF91
AbstractProbabilistic models over strings have played a key role in developing methods that take into consideration indels as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question is the complexity of computing the normalization of a class of string-valued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a set of results based on this linear algebra view that facilitate the analysis and design of inference algorithms on string-valued graphical models. As an illustration, we use this method to give a new elementary proof of a known result on the complexity of inference on the ``TKF91'' model, a well-known probabilistic model over strings. Compared to previous work, our proving method is easier to extend to other models, since it relies on a novel weak condition, triangular transducers, which is easy to establish in practice. The linear algebra view provides a concise way of describing transducer algorithms and their compositions, opens the possibility of transferring fast linear algebra libraries (for example, based on GPUs), as well as low rank matrix approximation methods, to string-valued inference problems.
DOI10.1007/s11538-013-9906-6