RECOMB-CG

Transcription

RECOMB-CG19-22 October 2014New York – Cold Spring Harbor

Posters1Ogun Adebali, Davi Ortega and Igor Zhulin. CDvisto: a Comprehensive Domain Visualization Tool2Charlotte Darby, Maureen Stolzer and Dannie Durand. What’s in a name? An expanded classification of xenologs3Minli Xu, Jeffrey Lawrence and Dannie Durand. Comparative genomics sheds light on the evolution and function of the Highly IterativePalindrome -1 motif in Cyanobacteria4Manuel Lafond, Emmanuel Noutahi, Jonathan Séguin, Magali Semeria, Nadia El-Mabrouk, Laurent Gueguen and Eric Tannier. Gene TreeCorrection with TreeSolver5Anna Paola Carrieri and Laxmi Parida. SimRA: Rapid & Accurate Simulation of Populations based on Random-Graph Models of ARG6Francesco Abate, Sakellarios Zairis, Elisa Ficarra, Andrea Acquaviva, Chris Wiggins, Veronique Frattini, Anna Lasorella, Antonio Iavarone,Giorgio Inghirami and Raul Rabadan. Pegasus: annotation and prediction of oncogenic gene fusion events as a supervised learning task7Daniel Doerr, Jens Stoye and Katharina Jahn. Discovering common intervals in multiple indeterminate strings8Guillaume Holley, Roland Wittler and Jens Stoye. Bloom Filter Trie - a data structure for pan-genome storage9Manfred Klaas, Paul Cormican, Thibauld Michel and Susanne Barth. Genotyping by sequencing of a collection of Miscanthus spp. accessions10 Han Lai and Dannie Durand. How much are you willing to pay? Selecting costs for reconciliation with duplication and transfers11 Siavash Mirarab, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo Zimmermann, M. Shel Swenson and Tandy Warnow. ASTRAL: fast andaccurate species tree estimation from gene trees12 Siavash Mirarab, Nam-Phuong Nguyen and Tandy Warnow. PASTA: ultra-large multiple sequence alignment13 Alexandra Dana and Tamir Tuller. The effect of tRNA levels on decoding times of mRNA codons14 Ghada Badr and Arwa Alturki. CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm15 Robert Aboukhalil, Joan Alexander, Jude Kendall, Michael Wigler and Gurinder Atwal. Single-cell sequencing: How many is many enough?16 Cedric Chauve, Yann Ponty and João Paulo Pereira Zanetti. Evolution of genes neighborhood within reconciled phylogenies: an ensembleapproach17 Sapna Sharma and Klaus F. X. Mayer. Genome and sequence characteristics indicate frequent introgressive hybridization events in monocotsand dicots18 Nina Luhmann, Cedric Chauve, Jens Stoye and Roland Wittler. Scaffolding of Ancient Contigs and Ancestral Reconstruction in a PhylogeneticFramework19 Ghada Badr and Haifa Alaqel. Genome Rearrangement for RNA Secondary Structure Using a Component-Based Representation: An InitialFramework20 Di Huang and Ivan Ovcharenko. Identifying risk-associated regulatory SNPs in ChIP-seq enhancers21 Kevin Emmett and Raul Rabadan. Characterizing Horizontal Gene Transfer in Microbial Evolution using Topological Data Analysis22 Mehmet Gunduz, Esra Gunduz, Omer Faruk Hatipoglu, Gokhan Nas, Elif Nihan Cetin, Bunyamin Isik and Ramazan Yigitoglu. Role of p33ING1bin Head and Neck Cancer23 Pedro Feijao, Fábio V Martinez, Marília Braga and Jens Stoye. The Family-Free Double Cut and Join and its application to ortholog detection24 Corey Hudson and Kelly Williams. LearnedPhyloblocks: Novel Genomic Islands through Phylogenetic Profiling25 Philip Davidson, Luisa Hiller, Michael T. Laub and Dannie Durand. Tracking the Evolution of a Signal Transduction Pathway Architecture withComparative Genomics26 Krister Swenson and Mathieu Blanchette. Linking Genome Rearrangements and Chromatin Conformation27 Filippo Utro, Deniz Yorukoglu, David Kuhn, Saugata Basu and Laxmi Parida. Topological Data Analysis to detect population admixture inrecombining chromosomes28 Yee Him Cheung, Nevenka Dimitrova and Wim Verhaegh. Achieving Cross-Platform Compatibility of Gene Expression Data29 Filippo Utro, Daniel E. Platt and Laxmi Parida. K-mer Analysis of Ebola sequences differentiates outbreaks

Poster 1CDvist: a Comprehensive Domain Visualization ToolOgun Adebali1,2 §, Davi R. Ortega1,2 , Igor B. Zhulin1,21Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37861, USA, 2Department of Microbiology, University of Tennessee, Knoxville TN 37996, USA§Corresponding author and poster presenter: oadebali@vols.utk.eduAbstractThe study of a novel protein starts by obtaining information about the protein using computationalmethods. The public databases of protein sequences provide a framework for theoretical predictions of functionand structure of biomolecules. Based on this wealth of information, specialized datasets of protein domainmodels are maintained to facilitate protein domain recognition in newly sequenced proteins. Several publicallyavailable webservers, HMMER, CD-search and HHpred utilize a variety of algorithms to predict proteindomains in sequences based on similarity searches against these datasets. Despite the power and the popularityof these algorithms, none of the available services combines batch querying, consistent visualization schemeand a comprehensive retrieval of protein domain information, especially for multi-domain proteins. Morespecifically, all these services operate on a whole protein sequence given as input, which may bias resultstowards more conserved domains and may leave significant protein regions without a match to a known proteindomain profile. We propose that domain coverage in multi-domain proteins can be dramatically increased byautomated exhaustive search of protein regions without significant match against a variety of databases. Wehave developed CDvist (Comprehensive Domain Visualization Tool), which combines the power of existingalgorithms (HMMER, RPS-BLAST, HHsearch, HHblits) and protein domain databases to a user-friendlyvisualization framework. To increase domain coverage, rather than using the entire sequence, CDvistiteratively identify regions without significant domain match and submits each of these segments to similaritysearch against a pre-determined sequence of databases until the entire protein sequence is covered or alldatabases have been searched. Our web-server allows bulk querying at a high speed enabled by a parallelprocessing environment. A custom JavaScript module is implemented to represent results in a comprehensive,biologist-friendly manner. We designed CDvist web-server to be used primarily by experimentalists, who areinterested to learn more about protein or protein sets of choice. However, it is also attractive to computationalbiologists due to its bulk querying and JSON formatted export features.

Poster 2What’s in a name? An expanded classification of xenologsCharlotte Darby 1*, Maureen Stolzer 1*, Dannie Durand 1§1Carnegie Mellon University, Pittsburgh, Pennsylvania, USAEqual author contribution§Corresponding author durand@cs.cmu.eduCD is the poster presenter cdarby@andrew.cmu.edu*AbstractHorizontal gene transfer occurs when a species acquires a gene from a source other than its ancestor.This phenomenon is a fundamental process of gene family evolution in prokaryotes, and evidence is mountingthat it occurs in eukaryotes. Growing literature describes the diversity and complexity of gene family historiesinvolving horizontal gene transfer. However, the nomenclature currently available to describe homologyrelationships when transfer is implicated remains ambiguous. Careful classification of horizontally transferredgenes is essential for gaining insight into complex evolutionary processes. Precise characterization is alsoimportant because gene homology is frequently used to predict gene function.Gray and Fitch (Mol. Biol. Evol. 1983) coined the term “xenolog” to describe “clearly homologous”relationships involving genes of foreign origin. In his landmark review, Fitch (Trends Genet. 2000) definedxenology as “the relationship of any two homologous characters whose history, since their common ancestor,involves an interspecies (horizontal) transfer of the genetic material.” Current terminology based on thisdefinition would label all genes related through a transfer event as xenologs, not distinguishing among thedifferent homologous relationships involving transfer that can occur.Expanding upon Fitch’s definition, we propose a classification scheme that offers much-neededprecision for describing xenologous relationships. Our scheme distinguishes between gene pairs related bytransfer alone and genes related by both duplication and transfer. Additionally, our system accounts for theinherent asymmetry of horizontal transfer by differentiating between the donor and recipient species. Whetheror not both genes are in the same species is also taken into account. It further considers when genes divergedrelative to the divergence of the species in which we observe those genes. Further, we define formal rules thatunambiguously assign gene pairs to the xenolog subtypes in our classification. These rules are based on genetree-species tree reconciliation and have been implemented in prototype software. To show the importance ofthese distinctions, we apply our conceptual framework to a representative published example: the S. cerevisiaebiotin synthesis pathway. This example, one of many that can be found in the literature, demonstrates how ourterminology facilitates interpretation of functional relationships between xenologs.

Poster 3Comparative genomics sheds light on the evolution and function of the Highly IterativePalindrome -1 motif in CyanobacteriaMinli Xu 1, Jeffrey G. Lawrence 2, Dannie Durand 3§1Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213 2 Department ofBiological Science, University of Pittsburgh, Pittsburgh, PA 15213 3 Department of Biological Science,Carnegie Mellon University, Pittsburgh, PA 15213§Corresponding author durand@cs.cmu.eduMX is the poster presenter minlix@andrew.cmu.eduAbstractThe Highly Iterative Palindrome-1 (HIP1), an octamer palindromic motif (GCGATCGC), is highlyabundant in a wide range of cyanobacterial genomes from various habitats. HIP1 frequency can be as high asone occurrence per 350 nucleotides, which is rather astonishing considering that at this frequency, on average,every gene in that genome will be associated with more than one HIP1 motif. HIP1 was first identified in theearly 1990s, yet its functional and molecular roles are still not understood. No mechanism or biological systemhas been identified that explains this level of prevalence. More discouraging, it is still not clear whether HIP1has a function, or whether HIP1 abundance is an artifact of some neutral process, such as DNA repair ortransposition.Here we present results from genome scale analyses that provide the first evidence that HIP1 motifs areunder selection. We estimate the expected HIP1motif frequency, taking into account the background trinucleotide frequency in the genome, and showed that observed HIP1 frequencies are as much as 100 timeshigher than expected. This HIP1 motif enrichment is observed in both coding and non-coding regions.Analyses of alignments of genomes with Ks values ranging from 0.02 to 0.59 further showed HIP1 motifconservation in homologous sequences. The level of HIP1 conservation is significantly higher than theconservation of control motifs, i.e., other octamer palindromes with the same GC content. To show that suchconservation is not merely a result of codon usage, we demonstrated that codons in HIP1 motifs are moreconserved than the same codons found outside HIP1 motifs. Our results, taken together, are consistent withselection acting on HIP1 motifs. We provide the first concrete evidence for the hypothesis that the abundanceof HIP1 motifs is related to biological functions, rather than to some neutral process.

Poster 4Gene Tree Correction with TreeSolverManuel Lafond1*, Emmanuel Noutahi1, Jonathan Séguin1, Magali Semeria2, Nadia El-Mabrouk1*, LaurentGueguen2 and Eric Tannier2,31Département d’informatique et de recherche opérationnelle, niversité de Montréal, Montréal, QC, CanadaH3C 3J7 2 Laboratoire de biométrie et biologie évolutive, UMR CNRS 5558, Université Lyon I, F-69622Villeurbanne, France 3 INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France§Corresponding authors and poster presentersML: lafonman@iro.umontreal.caN E-M: mabrouk@iro.umontreal.caAbstractWe present TreeSolver, a new integrated framework for gene tree correction accounting for localmutations at the sequence level, as well as global mutations affecting gene content and order. In the same veinof recently developed software such as TreeFix, a neighborhood of an input tree is explored, and a correctionselected on genome-level criteria is accepted only if it is statistically equivalent to the original tree. However,while a tree neighborhood is explored by previous algorithms in a stochastic way, we take a deterministic andmore targeted approach by focusing on the problematic parts of the tree: weakly supported edges and nodes.

Poster 5SimRA: Rapid & Accurate Simulation of Populations based on Random-Graph Models ofARGAnna Paola Carrieri1, Laxmi Parida2§1Università degli studi di Milano-Bicocca, Milan, Italy 2 Computational Genomics, IBM T. J. Watson ResearchCenter, Yorktown Heights, USA§Corresponding author parida@us.ibm.comAPC is the poster presenter ng populations is a fundamental problem in population genetics and is crucial in many appliedareas. A generative model simulates the population by evolving a population over time. Here we use the WrightFisher population model of genetic variation. Backward simulations (primarily based on coalescence [3]) areusually much faster than forward simulations due to the elimination of genetic transmission paths that are notrelevant to the samples under study. When genetic exchange events are modeled in addition to thepolymorphisms of the duplication model, the resulting network structure is called an ancestral recombination

11 Siavash Mirarab, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo Zimmermann, M. Shel Swenson and Tandy Warnow. ASTRAL: fast and accurate species tree estimation from gene trees 12 Siavash Mirarab, Nam -Phuong Nguyen and Tandy Warnow. PASTA: ultra large multiple sequence alignment 13 Alexandra Dana and Tamir Tuller. The effect of tRNA levels on decoding times of mRNA codons 14