Cargando…

Controversies in modern evolutionary biology: the imperative for error detection and quality control

BACKGROUND: The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies....

Descripción completa

Detalles Bibliográficos
Autores principales: Prosdocimi, Francisco, Linard, Benjamin, Pontarotti, Pierre, Poch, Olivier, Thompson, Julie D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3311146/
https://www.ncbi.nlm.nih.gov/pubmed/22217008
http://dx.doi.org/10.1186/1471-2164-13-5
_version_ 1782227753325559808
author Prosdocimi, Francisco
Linard, Benjamin
Pontarotti, Pierre
Poch, Olivier
Thompson, Julie D
author_facet Prosdocimi, Francisco
Linard, Benjamin
Pontarotti, Pierre
Poch, Olivier
Thompson, Julie D
author_sort Prosdocimi, Francisco
collection PubMed
description BACKGROUND: The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes. RESULTS: We investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events. CONCLUSIONS: Initial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data.
format Online
Article
Text
id pubmed-3311146
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33111462012-04-02 Controversies in modern evolutionary biology: the imperative for error detection and quality control Prosdocimi, Francisco Linard, Benjamin Pontarotti, Pierre Poch, Olivier Thompson, Julie D BMC Genomics Research Article BACKGROUND: The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes. RESULTS: We investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events. CONCLUSIONS: Initial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data. BioMed Central 2012-01-04 /pmc/articles/PMC3311146/ /pubmed/22217008 http://dx.doi.org/10.1186/1471-2164-13-5 Text en Copyright ©2012 Prosdocimi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Prosdocimi, Francisco
Linard, Benjamin
Pontarotti, Pierre
Poch, Olivier
Thompson, Julie D
Controversies in modern evolutionary biology: the imperative for error detection and quality control
title Controversies in modern evolutionary biology: the imperative for error detection and quality control
title_full Controversies in modern evolutionary biology: the imperative for error detection and quality control
title_fullStr Controversies in modern evolutionary biology: the imperative for error detection and quality control
title_full_unstemmed Controversies in modern evolutionary biology: the imperative for error detection and quality control
title_short Controversies in modern evolutionary biology: the imperative for error detection and quality control
title_sort controversies in modern evolutionary biology: the imperative for error detection and quality control
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3311146/
https://www.ncbi.nlm.nih.gov/pubmed/22217008
http://dx.doi.org/10.1186/1471-2164-13-5
work_keys_str_mv AT prosdocimifrancisco controversiesinmodernevolutionarybiologytheimperativeforerrordetectionandqualitycontrol
AT linardbenjamin controversiesinmodernevolutionarybiologytheimperativeforerrordetectionandqualitycontrol
AT pontarottipierre controversiesinmodernevolutionarybiologytheimperativeforerrordetectionandqualitycontrol
AT pocholivier controversiesinmodernevolutionarybiologytheimperativeforerrordetectionandqualitycontrol
AT thompsonjulied controversiesinmodernevolutionarybiologytheimperativeforerrordetectionandqualitycontrol