Cargando…
A universal genomic coordinate translator for comparative genomics
BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086997/ https://www.ncbi.nlm.nih.gov/pubmed/24976580 http://dx.doi.org/10.1186/1471-2105-15-227 |
_version_ | 1782324873703456768 |
---|---|
author | Zamani, Neda Sundström, Görel Meadows, Jennifer RS Höppner, Marc P Dainat, Jacques Lantz, Henrik Haas, Brian J Grabherr, Manfred G |
author_facet | Zamani, Neda Sundström, Görel Meadows, Jennifer RS Höppner, Marc P Dainat, Jacques Lantz, Henrik Haas, Brian J Grabherr, Manfred G |
author_sort | Zamani, Neda |
collection | PubMed |
description | BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N(2) with the number of available genomes, N. RESULTS: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. |
format | Online Article Text |
id | pubmed-4086997 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40869972014-07-09 A universal genomic coordinate translator for comparative genomics Zamani, Neda Sundström, Görel Meadows, Jennifer RS Höppner, Marc P Dainat, Jacques Lantz, Henrik Haas, Brian J Grabherr, Manfred G BMC Bioinformatics Software BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N(2) with the number of available genomes, N. RESULTS: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. BioMed Central 2014-06-30 /pmc/articles/PMC4086997/ /pubmed/24976580 http://dx.doi.org/10.1186/1471-2105-15-227 Text en Copyright © 2014 Zamani et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Zamani, Neda Sundström, Görel Meadows, Jennifer RS Höppner, Marc P Dainat, Jacques Lantz, Henrik Haas, Brian J Grabherr, Manfred G A universal genomic coordinate translator for comparative genomics |
title | A universal genomic coordinate translator for comparative genomics |
title_full | A universal genomic coordinate translator for comparative genomics |
title_fullStr | A universal genomic coordinate translator for comparative genomics |
title_full_unstemmed | A universal genomic coordinate translator for comparative genomics |
title_short | A universal genomic coordinate translator for comparative genomics |
title_sort | universal genomic coordinate translator for comparative genomics |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086997/ https://www.ncbi.nlm.nih.gov/pubmed/24976580 http://dx.doi.org/10.1186/1471-2105-15-227 |
work_keys_str_mv | AT zamanineda auniversalgenomiccoordinatetranslatorforcomparativegenomics AT sundstromgorel auniversalgenomiccoordinatetranslatorforcomparativegenomics AT meadowsjenniferrs auniversalgenomiccoordinatetranslatorforcomparativegenomics AT hoppnermarcp auniversalgenomiccoordinatetranslatorforcomparativegenomics AT dainatjacques auniversalgenomiccoordinatetranslatorforcomparativegenomics AT lantzhenrik auniversalgenomiccoordinatetranslatorforcomparativegenomics AT haasbrianj auniversalgenomiccoordinatetranslatorforcomparativegenomics AT grabherrmanfredg auniversalgenomiccoordinatetranslatorforcomparativegenomics AT zamanineda universalgenomiccoordinatetranslatorforcomparativegenomics AT sundstromgorel universalgenomiccoordinatetranslatorforcomparativegenomics AT meadowsjenniferrs universalgenomiccoordinatetranslatorforcomparativegenomics AT hoppnermarcp universalgenomiccoordinatetranslatorforcomparativegenomics AT dainatjacques universalgenomiccoordinatetranslatorforcomparativegenomics AT lantzhenrik universalgenomiccoordinatetranslatorforcomparativegenomics AT haasbrianj universalgenomiccoordinatetranslatorforcomparativegenomics AT grabherrmanfredg universalgenomiccoordinatetranslatorforcomparativegenomics |