Cargando…

A universal genomic coordinate translator for comparative genomics

BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved...

Descripción completa

Detalles Bibliográficos
Autores principales: Zamani, Neda, Sundström, Görel, Meadows, Jennifer RS, Höppner, Marc P, Dainat, Jacques, Lantz, Henrik, Haas, Brian J, Grabherr, Manfred G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086997/
https://www.ncbi.nlm.nih.gov/pubmed/24976580
http://dx.doi.org/10.1186/1471-2105-15-227
_version_ 1782324873703456768
author Zamani, Neda
Sundström, Görel
Meadows, Jennifer RS
Höppner, Marc P
Dainat, Jacques
Lantz, Henrik
Haas, Brian J
Grabherr, Manfred G
author_facet Zamani, Neda
Sundström, Görel
Meadows, Jennifer RS
Höppner, Marc P
Dainat, Jacques
Lantz, Henrik
Haas, Brian J
Grabherr, Manfred G
author_sort Zamani, Neda
collection PubMed
description BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N(2) with the number of available genomes, N. RESULTS: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
format Online
Article
Text
id pubmed-4086997
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40869972014-07-09 A universal genomic coordinate translator for comparative genomics Zamani, Neda Sundström, Görel Meadows, Jennifer RS Höppner, Marc P Dainat, Jacques Lantz, Henrik Haas, Brian J Grabherr, Manfred G BMC Bioinformatics Software BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N(2) with the number of available genomes, N. RESULTS: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. BioMed Central 2014-06-30 /pmc/articles/PMC4086997/ /pubmed/24976580 http://dx.doi.org/10.1186/1471-2105-15-227 Text en Copyright © 2014 Zamani et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Zamani, Neda
Sundström, Görel
Meadows, Jennifer RS
Höppner, Marc P
Dainat, Jacques
Lantz, Henrik
Haas, Brian J
Grabherr, Manfred G
A universal genomic coordinate translator for comparative genomics
title A universal genomic coordinate translator for comparative genomics
title_full A universal genomic coordinate translator for comparative genomics
title_fullStr A universal genomic coordinate translator for comparative genomics
title_full_unstemmed A universal genomic coordinate translator for comparative genomics
title_short A universal genomic coordinate translator for comparative genomics
title_sort universal genomic coordinate translator for comparative genomics
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086997/
https://www.ncbi.nlm.nih.gov/pubmed/24976580
http://dx.doi.org/10.1186/1471-2105-15-227
work_keys_str_mv AT zamanineda auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT sundstromgorel auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT meadowsjenniferrs auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT hoppnermarcp auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT dainatjacques auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT lantzhenrik auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT haasbrianj auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT grabherrmanfredg auniversalgenomiccoordinatetranslatorforcomparativegenomics
AT zamanineda universalgenomiccoordinatetranslatorforcomparativegenomics
AT sundstromgorel universalgenomiccoordinatetranslatorforcomparativegenomics
AT meadowsjenniferrs universalgenomiccoordinatetranslatorforcomparativegenomics
AT hoppnermarcp universalgenomiccoordinatetranslatorforcomparativegenomics
AT dainatjacques universalgenomiccoordinatetranslatorforcomparativegenomics
AT lantzhenrik universalgenomiccoordinatetranslatorforcomparativegenomics
AT haasbrianj universalgenomiccoordinatetranslatorforcomparativegenomics
AT grabherrmanfredg universalgenomiccoordinatetranslatorforcomparativegenomics