Recent advances in modeling biological processes

This workshop will gather world expert in the area of computational biology and modeling biological processes including statisticians, computer scientists and biologists to discuss some of the latest developments and current bottlenecks in the area. In particular, we will discuss the latest modeling and computational problems as well as data integration from multiple sources.
This meeting is part of a series of activities which are held in the pacific northwest (PNW) as part of a recently awarded PIMS collaborative research group (CRG) on Bayesian modeling and computation for networks. The organization of the meeting has been specifically designed to encourage interactions among researchers from different fields.

Location: Fred Hutchinson Cancer Research Center
Arnold Building (PHS), rooms M1-A303 & M1-A305
Here is a link to the FHCRC campus map: http://www.fhcrc.org/about/maps/campusmap.html
For shuttles from UW Medical Center, the schedules can be found at http://www.fhcrc.org/about/visitor/shuttles/
For general information about Seattle buses please have a look here: http://transit.metrokc.gov/
-- metro bus lines 66 and 70 will get you from Downtown or the university district to the Hutch.

Date: Dec 1-2, 2008

Organizers: Adrian Dobra, Raphael Gottardo, Lurdes Inoue

Sponsors: PIMS, VIGRE

Confirmed Speakers (in no particular order):
Armand Bankhead (Merck), Michael Lawrence (FHCRC), Robert Gentleman (FHCRC), David Reiss (ISB), Nitin Baliga (ISB), Charles Kooperberg (FHCRC), Michael Leblanc (FHCRC), Beatrice Knudsen (FHCRC), Li Hsu (FHCRC), Muneesh Tewari (FHCRC), Peter Müller (MD Anderson), Steve Horvath (UCLA), William Noble (UW), Edo Airoldi (Harvard).

Important Dates:
Deadline for participation confirmation, Nov 1, 2008.
REGISTRATION IS NOW CLOSED.
Deadline for title/abstract submission, Nov 1, 2008. Titles/Abstracts should be sent to Raphael Gottardo.


Confirmed participants (in no particular order):
Peter Müller (MD Anderson), Steve Horvath (UCLA), , Edo Airoldi (Harvard), Armand Bankhead (Merck), Michael Lawrence (FHCRC), Robert Gentleman (FHCRC), David Reiss (ISB), Nitin Baliga (ISB), Charles Kooperberg (FHCRC), Michael Leblanc (FHCRC), Beatrice Knudsen (FHCRC), Li Hsu (FHCRC), Muneesh Tewari (FHCRC), William Noble (UW), Yanming Di (UW), Elisabeth Rosenthal (UW), Serge Sverdlov (UW), Li Qin (UW), Amy Laird (UW), Soyoung Ryu (UW), Ben Ely (UW), Cici Xi Chen (UW), Jung-Lim Shin (UW), Charles Cheung (UW), Ahrim Youn (UW), Michael Hoffman (UW), Peter Sudmant (UW), Tom Milac (UW), Chen Yanover (UW), Cindy Zhang (UW), Yan Liu (UW), Scott Diede (UW), Al Hallstrom (UW), Jeannine McCune (UW), Sue Li (UW), Aaron Brooks (UW), FangYin Lo (UW), Kenneth Lo (UBC), Lei Hua (UBC), Kaida Ning (UBC), Gordon Robertson (BC Cancer Agency), Mauricio Neira (UBC), Michelle Xia (UBC), Kevin Murphy (UBC), Jihyoun Jeon (FHCRC), Dan Gottschling (FHCRC), Michelle Oeser (UW), Chenwei Lin (UW), Youyi Fong (UW), Anna Korpak (UW), Natalie Thompson (UW), Shannon Tsai (UW), Steve Self (FHCRC), Joachim Voss (UW), Eric Foss (UW), Tom Skillman (UW), Jane Lange (UW), Nan Hu (UW), Jason Shaw (UW), Lik Wee (UW), Youyi Fong (UW), Xioahong Li (UW), Tim Randolph (UW), Tom Boyle (UW), Maggie Andrilla (UW), Vladimir Minin (UW), Mark Seligman (UW), Chris Fraley (UW), Christine Lloyd (UW), Ying Chen (UW), Qi Liu (UW), Ralf Luche (UW), Suzanna Reid (FHCRC), M Elizabeth Halloran (UW), Sean Devlin (UW), Ying Huang (UW), Jun Kitano (UW), Lin Li (UW), Jiangning Li (UW), David Scalzo (UW), Pei Wang (UW), Albert C Huang (UW), Kevin Hayes (UW), Daryl Morris (UW), Pingping Qu (UW), Elahe Mostaghel (UW), Irina Dinu (UofA), Ken Rice (UW), Judy Zhong (UW), Deanna Petrochilos (UW), Neli Ulrich (UW), Xuesong Yu (UW).
Note that out of town students will be expected to share a room (2 students/room).

Preliminary program (may change slightly):
Monday
Dec 1
Monday
Dec 1
Tuesday
Dec 2
Tuesday
Dec 2
8:45-9:00 Introduction
9:00-10:00 Steve Horvath
"Geometric Interpretation of Gene Coexpression Network Analysis”
9:00-10:00 Peter Müller
Random Partition Models with Covariates
10:00-10:30 Michael Lawrence
“Manipulating, simulating and visualizing biological network models with R, Bioconductor and GGobi”
10:00-10:30 David Reiss
“The modeling and inference of a global, dynamic gene regulatory network from high-throughput data”
10:30-11:00 Coffee Break 10:30-11:00 Coffee Break
11:00-11:30 Muneesh Tewari
”MicroRNAs and cancer”
11:00 William Noble
11:30-12:00 Nitin S. Baliga
”A predictive model of adaptive responses to environmental changes”
-12:00 Probability models of transmembrane protein topology, peptide fragmentation and heterogeneous genome-wide data”
12:00-12:30 Beatrice Knudsen
Oncogenic transformation, response to environmental factors and adaptation to hypoxia promote tumor metastasis”
12:00-12:30 Armand Bankhead
“Protein Interaction Permutation Analysis to Identify Signaling Networks Using RNA Interference”
12:30-1:30 Lunch 12:30-1:30 Closing Remarks/Lunch
1:30-2:30 Edo Airoldi
”A statistical perspective on cellular growth”
2:30-3:00 Li Hsu
Learning Networks from High Dimensional Genomic Instability Data”
3:00-3:30 Break
3:30-4:00 Robert Gentleman
Data Integration with Bayes Factors
4:00-4:30 Charles Kooperberg
Using main effects to find interactions in genome-wide studies”
4:30-5:00 Michael LeBlanc
Using interactions to find main effects in genome-wide studies”
5:00-7:00 Poster Session

Poster session:
A poster session will be held on Dec 1. More details will be posted soon. If you’d like to present a poster please send Raphael Gottardo an email.

Poster abstracts:

Title: Testing Gene Associations Using Co-citation
Authors: Anton H. Westveld, Beiying Ding, and Robert Gentleman

Abstract: With the emergence and development of high throughput genomic technologies, such as DNA microarrays, a wealth of data are being produced; therefore, it is important to develop methodologies that facilitate exploratory analyses of gene expression data in addition to developing inferential tools. More use of meta-data, such as that available from PubMed, Gene Ontology (GO), and various sequence based annotations, should be made. These data constitute a useful source of information that can help interpret, enrich, and validate features and patterns arising from experimental data. In this paper, we explore the use of current statistical methodologies and develop new ones in order to quantify gene-gene relationships based on data representing co-citation networks between genes and journal articles in PubMed.



Title: A biological evaluation of gene set analysis methods using the NCI-60 cancer cell lines dataset
Authors: Irina Dinu

Abstract: Gene-set analysis of microarray data evaluates biological pathways, or gene sets, for their differential expression by a phenotype of interest. Many data-analytic methods have been proposed for gene-set analysis. We evaluate the biological performance of five gene-set analysis methods that test “self-contained null hypotheses” via subject sampling: Significance Analysis of Microarray for Gene Sets (SAM-GS), Global Test, Analysis of Covariance (ANCOVA) Global Test, the method of Tian et al.(PNAS, 2005); and the method of Tomfohr et al.(BMC Bioinformatics, 2005), along with the most popular gene-set analysis method, Gene Set Enrichment Analysis (GSEA), using three real microarray analysis from the NCI-60 microarray data. The comparison of the six methods with respect to the true-positive and true-negative rates illustrate varying biological performance of these gene-set analysis methods, suggesting advantages of SAM-GS, Global, and ANCOVA Global methods over GSEA, Tian et al.[PNAS, 2005], and the method of Tomfohr et al.[BMC Bioinformatics, 2005]. A free Excel Add-In for performing SAM-GS is available for public use at http://www.ualberta.ca/~yyasui/homepage.html.




Title: MiRGenISCA
Author: Mauricio Neira

Abstract: A new layer of complexity in gene regulation is attributed to microRNAs, short RNA molecules that in mature/active form are 20-25 nucleotides long and regulate protein expression by partial or complete complementary binding to mRNAs, causing translational inhibition and/or destabilizing mRNAs. Different computational algorithms show hundreds of potential targets per microRNA, but little is known about which interactions actually happen in nature or within a given biological process. MicroRNAs affect simultaneously and combinatorially hundreds of targets and have an impact on the fine tuning of gene expression leading to experimentally detected mRNA expression patterns.

Here, we present a methodology that uses mRNA expression measurements and potential microRNA target interactions to infer activity of microRNAs by statistically significant association of potential microRNA targets with clusters of mRNA expression (using fuzzy clustering) and present a heuristic for the scoring and
ranking of pair wise microRNA-target interactions within the system of interest.

Without experimentation there is no certainty as to what microRNA-mRNA interactions are actually happening in a given system and so there is no extensive ground truth to be able to judge the performance of the proposed method. We addressed the performance of the method in two ways:

First, We assessed performance with a measure of inter-rater reliability, by comparing the ranking (per microRNA) of pair wise interactions obtained with our method (MiRGenisCA) and the ranking obtained by a Bayesian and machine learning method (GenMiR++) that scores pair wise interactions based on a linear model of microRNA-target potential interactions using mRNA and microRNA expression measurements. We found, for many microRNAs, statistically significant agreement between the rankings from both methods, as assessed by Kendallís W coefficient of concordance.

Second, We applied our methodology to different data sets related to cancer and found high activity of microRNAs that have been experimentally demonstrated to be active in different cancer systems and may function as oncogenes or tumor suppressors.

MiRGenISCA is a methodology to detect microRNA activity associated with gene clusters and score microRNA-target interactions within the context of a given biological system and does not rely on microRNA expression data, which is practical, as not always measurements of the expression levels of all and mature microRNAs are available and allowing the data mining of publicly available mRNA expression data sets to create hypotheses about the biological activity of microRNAs.




Title: Finding the number of protein subfamilies by Bayesian mixture modeling
Author: Youyi Fong

Abstract: We model a protein family as a mixture of profile Hidden Markov models and use integrated likelihood for a given number of clusters to do inference on mixture complexity. Efforts are concentrated at good proposal
distributions for importance sampling. We apply our method to simulated and real datasets and compare the results to biological truth and results from other methods. Future work to account for correlation among sequences induced by phylogeny is discussed. Joint work with Prof. Jon Wakefield and Ken Rice.




Title: Conditional Tests for Linkage Localization
Author: Yanming Di and Elizabeth A. Thompson

Abstract: With pedigree data, linkage can be detected using inheritance vector
tests, which explore the discrepancies between conditional
distributions of the inheritance vectors given the trait values and
unconditional distributions of the inheritance vectors. Marginal
inheritance vector tests, however, will show significance not only
at the causal loci but also at loci linked to the causal loci, thus
in general, do not provide accurate localization information. We
discuss the potential of using conditional tests to guide linkage
localization.




Title: Comparing statistical methods and tools for the analysis of ChIP-Seq
Authors: Gordon Robertson, Kaida Ning, Xuekui Zhang, Raphael Gottardo

Abstract: TBA




Title: PICS: Probabilistic Inference for ChIP-Seq
Authors: Xuekui Zhang, Gordon Robertson, Kaida Ning, Raphael Gottardo

Abstract: TBA






Talk abstracts:
Speaker: Edo Airoldi
Title: A statistical perspective on cellular growth.

Abstract: Maintaining balanced growth in a changing environment is a fundamental systems-level  challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop a computational methodology to identify quantitative aspects of the regulatory mechanisms underlying cell proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes accurately predict the instantaneous growth rate of any cellular culture, robust to changing biological conditions, experimental methods, and technological platforms. Our proposed model also predicts growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying  regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the identified gene expression signature from multiple perspectives: by perturbing the regulatory network through the Ras/cAMP/PKA pathway, observing strong up-regulation of growth rate even in the absence of appropriate nutrients, and by discovering potential transcription factor binding sites enriched in growth-correlated genes. Our statistical model thus enables biological insights about growth at instantaneous time scales inaccessible by other experimental methods.

Pre-print: Predicting cellular growth from gene expression signatures (September 2008). Airoldi, E.M., Huttenhower, C., Gresham, D., Lu, C., Caudy, A., Dunham, M., Broach, J., Botstein, D., & Troyanskaya, O.G. (pre-print online at http://www.genomics.princeton.edu/~eairoldi/pubs/eairoldi08_growth.pdf)



Speaker: Nitin S. Baliga
Title: A predictive model of adaptive responses to environmental changes

Abstract: All organisms routinely sense and process complex changes in their environment through a web of intricate information processing networks to adapt their behavior.  Any attempt to predict these responses or reengineer new ones would require a sophisticated and quantitative understanding of this entire process.  Using a systems approach we have constructed a predictive model of the complete gene regulatory program in Halobacterium salinarum NRC-1, an archaeal microbe that thrives in a saturated salt environment that is lethal to most life forms.  The architecture of this model reflects how diverse physiological processes are inter-coordinated during environmental responses and as such it can be used as a framework for systems reengineering.  This model has also helped extract fundamental principles underlying evolutionary assembly of biological networks that will enable their deconstruction into predictive models.



Speaker: Robert Gentleman
Title: Data Integration with Bayes Factors

Abstract:  We will discuss an approach to integrating different data sources using Bayes factors.  The use of Bayes factors allows us to accommodate different rates of false positive and false negatives in the different data sources and allows us to apply some measure of confidence to predicted interactions.  We will demonstrate the use of this methodology on protein interaction data in yeast.



Speaker: David Reiss
Title: The modeling and inference of a global, dynamic gene regulatory network from high-throughput data

Abstract: We have recently published the first draft of a predictive model of the genome-wide transcriptional regulatory network of the model archaea, Halobacterium salinarum, an extremophile that thrives in extremely saline environments. This talk will focus on the computational and statistical methods for the inference of this network from high-throughput experimental and genomic data, integrated via: (1) the constrained biclustering of transcriptional data to derive gene regulatory modules (groups of putatively co-regulated genes) with common cis-regulatory elements in their promoter sequences, and (2) the statistical inference of the influences (both molecular and environmental) on each of these modules via a sparse linear approximation of the transcriptional kinetics. The resulting model, which identifies the regulatory influences on 80% of the genes in the genome, is capable of accurately predicting the global transcriptional response of Halobacterium to novel environmental stresses. We are currently working on updating this model to include recently-collected mechanistic information including transcriptional start/termination sites and protein-DNA binding data.



Speaker: Michael Lawrence
Title: Manipulating, simulating and visualizing biological network models with R, Bioconductor and GGobi

Abstract:Networks are effective models for representing and simulating biochemical systems. There is a synergistic relationship between the analysis of networks and experimental data analysis; each informs the other. Thus, tools are needed for the manipulation, visualization and modeling of networks in conjunction with experimental data. The Bioconductor project extends R with facilities for processing and analyzing biological data. The rsbml package enables the loading, manipulation, and simulation of biological systems described by the Systems Biology Markup Language (SBML) within the R environment. We are also developing interactive visualization tools in R that coordinate network drawings with plots of experimental data. This includes research on graph drawing algorithms that incorporate biological semantics and adapt the visualization to the changing focus of an analysis, without disrupting the mental frame of the analyst. This talk will introduce the support in Bioconductor for working with networks and will include a live demonstration of some visualization tools currently under development.



Speaker: William Noble
Title: Probability models of transmembrane protein topology, peptide fragmentation and heterogeneous genome-wide data

Abstract: In this talk, I will describe the application of dynamic Bayesian networks to three biological modeling tasks: (1) predicting the transmembrane topology of a protein from its primary sequence, (2)
modeling the fragmentation of peptides by collision-induced dissociation in a tandem mass spectrometer, and (3) modeling the domain structure of parallel genomic and epigenomic data sets, including MNase and DNaseI cleavage data, histone modifications, methylation, etc, along the human genome.



Speaker: Armand Bankhead
Title: Protein Interaction Permutation Analysis to Identify Signaling Networks Using RNA Interference

Abstract: The ability to study a gene's contribution to phenotype through RNA interference (RNAi) has provided unprecedented insight to the essential biology of an organism. Genome scale RNAi experiments performed on human cell lines such as HeLa show a minority of genes, when knocked down, have a dramatic effect on cell viability. Rather than view these lethal knockdown genes as independent effectors of phenotype, we take a systems biology approach to understanding RNAi hits in a network context. As a result we identify areas of cell signaling that are significant regulators of phenotype and map these networks to biological function.



Title: Geometric Interpretation of Gene Coexpression Network Analysis
Speaker: Steve Horvath

Abstract: The merging of network theory and microarray data analysis techniques has spawned a new field: gene coexpression network analysis. While network methods are increasingly used in biology, the network vocabulary of computational biologists tends to be far more limited than that of, say, social network theorists. Here we review and propose several potentially useful network concepts. We take advantage of the relationship between network theory and the field of microarray data analysis to clarify the meaning of and the relationship among network concepts in gene coexpression networks. Network theory offers a wealth of intuitive concepts for describing the pairwise relationships among genes, which are depicted in cluster trees and heat maps. Conversely, microarray data analysis techniques (singular value decomposition, tests of differential expression) can also be used to address difficult problems in network theory. We describe conditions when a close relationship exists between network analysis and microarray data analysis techniques, and provide a rough dictionary for translating between the two fields. Using the angular interpretation of correlations, we provide a geometric interpretation of network theoretic concepts and derive unexpected relationships among them. We use the singular value decomposition of module expression data to characterize approximately factorizable gene coexpression networks, i.e., adjacency matrices that factor into node specific contributions. High and low level views of coexpression networks allow us to study the relationships among modules and among module genes, respectively. We characterize coexpression networks where hub genes are significant with respect to a microarray sample trait and show that the network concept of intramodular connectivity can be interpreted as a fuzzy measure of module membership. We illustrate our results using human, mouse, and yeast microarray gene expression data. The unification of coexpression network methods with traditional data mining methods can inform the application and development of systems biologic methods. This is joint work with Jun Dong.

Relevant Citation:
Horvath S, Dong J (2008) Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Comput Biol 4(8): e1000117



Title: Oncogenic transformation, response to environmental factors and adaptation to hypoxia promote tumor metastasis.
Speaker: Beatrice Knudsen

Abstract: In this talk I will describe forces that lead to selection of metastatic cancer cells.  The forces in cancer cells include intrinsic oncogenic pathways, growth factor receptors and pathways activated in response to growth factors and cytokines in the tumor microenvironment, and pathways that mediate the adaptation to hypoxia.  The pathways form interactive circuits with nodes of convergence and negative feedback loops that enable cells to survive, move or grow under various pressures from the environment.  The concept serves as an example for how seemingly different mechanisms inside the cell can lead to the same endpoint, which is to establish a viable and progressive metastastic tumor nodule.



Title: MicroRNAs and cancer
Speaker: Muneesh Tewari

Abstract: MicroRNAs are small (~22 nt) regulatory RNAs that influence gene expression networks by repressing messenger RNA targets based on sequence-specific interactions. MicroRNA expression has been found to perturbed in all cancers studied, and therapeutic approaches targeting microRNAs are being actively developed. Furthermore, our group has recently found that microRNAs are
released by tumor cells into the bloodstream and may serve as disease biomarkers as well as potentially as a mode of cell-cell communication. The seminar will provide a review of microRNA biology and delineate some of the leading questions in the field.



Speaker: Charles Kooperberg
Title: Using main effects to find interactions in genome-wide studies

Abstract: We investigate the power to identify genegene interactions in genome-wide association studies. We focus on two-stage analyses: analyses in which we only test for interactions between single nucleotide polymorphisms that show some marginal effect. We give two algorithms to compute significance levels for such an analyses. One involves a Bonferoni correction on the number of interactions that are actually tested, and one is a resampling procedure similar. We also give an algorithm to carry out approximate power calculations for studies that plan to use a two-stage analysis. We find that for most plausible interaction effects a two-stage analysis can dramatically increase the power to identify interactions compared to a single-stage analysis based on simulation studies using known genetic models and data from existing genome-wide association studies.



Title: Using interactions to find main effects in genome-wide studies
Speaker: Michael LeBlanc

Abstract: Assessing the association of genomic attributes with disease outcomes is an important and ongoing area of applied research. Commonly, many univariate tests are calculated for a large number of genomic features. For instance, in human genetic association studies with single nucleotide polymorphisms (SNPs) thousands of SNPs are tested for association with disease outcomes. However, there are concerns that more complex relationships, such as multiple genes acting in concert or gene-environment (or gene-treatment) interactions, could attenuate the
marginal effect size and reduce the power to detect true associations if only marginal tests are used.

We propose a strategy which increases the scope of multiple testing by evaluating simple regularized interaction models and their corresponding weighted score statistics. These tests lead to focusing the search of associations adaptively based on environmental or clinical characteristics. One strategy uses a stage-wise estimated score weighting function analogous to statistical boosting prediction algorithms. Results from simulation studies confirm improved power of the proposed approaches compared to simple marginal testing in many situations.



Title: Learning Networks from High Dimensional Genomic Instability Data
Author: Li Hsu

Abstract: Genomic instability refers to the propensity for aberrations in chromosomes such as deletion, amplification, and other types of DNA copy number changes. Cancer develops as a result of an accumulation of these genetic aberrations at chromosomal locations that are critical in maintaining normal cell functions.  A commonly used method for identifying these aberrations is to study whether a polymorphic marker has a complete or partial signal reduction of one of the two alleles in the tumor compared to the patient's constitutional heterozygous normal sample at the same locus.  Recent progress in high throughput genotyping technologies has allowed one to obtain LOH status (binary events) at hundreds of thousands of genotypes simultaneously. It is well established that a normal cell lineage undergoing tumorigenesis is a result of combinations of aberrations.  In this talk, we will present an approach using regularized logistic regression to study the relations of these LOH events.  The typical spatial correlation among LOH events is also incorporated.  We will discuss progresses and limitations of our approach.

This is joint work with Pei Wang and Dennis Chao.



Title: Random partition models indexed with covariates
Author: Peter Müller

Abstract: We propose a model for covariate-dependent clustering, i.e., we develop a probability model for random partitions that is indexed by covariates.  The motivating application is inference for a clinical trial.  As part of the desired inference we wish to define clusters of patients.  Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates.  We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster.