Cargando…

Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Michael F., Deoras, Ameya N., Rasmussen, Matthew D., Kellis, Manolis
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291194/
https://www.ncbi.nlm.nih.gov/pubmed/18421375
http://dx.doi.org/10.1371/journal.pcbi.1000067
_version_ 1782152438016376832
author Lin, Michael F.
Deoras, Ameya N.
Rasmussen, Matthew D.
Kellis, Manolis
author_facet Lin, Michael F.
Deoras, Ameya N.
Rasmussen, Matthew D.
Kellis, Manolis
author_sort Lin, Michael F.
collection PubMed
description Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.
format Text
id pubmed-2291194
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-22911942008-04-18 Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes Lin, Michael F. Deoras, Ameya N. Rasmussen, Matthew D. Kellis, Manolis PLoS Comput Biol Research Article Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. Public Library of Science 2008-04-18 /pmc/articles/PMC2291194/ /pubmed/18421375 http://dx.doi.org/10.1371/journal.pcbi.1000067 Text en Lin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lin, Michael F.
Deoras, Ameya N.
Rasmussen, Matthew D.
Kellis, Manolis
Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
title Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
title_full Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
title_fullStr Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
title_full_unstemmed Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
title_short Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
title_sort performance and scalability of discriminative metrics for comparative gene identification in 12 drosophila genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291194/
https://www.ncbi.nlm.nih.gov/pubmed/18421375
http://dx.doi.org/10.1371/journal.pcbi.1000067
work_keys_str_mv AT linmichaelf performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes
AT deorasameyan performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes
AT rasmussenmatthewd performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes
AT kellismanolis performanceandscalabilityofdiscriminativemetricsforcomparativegeneidentificationin12drosophilagenomes