Cargando…

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however,...

Descripción completa

Detalles Bibliográficos
Autores principales: Nagy, Alinda, Szláma, György, Szarka, Eszter, Trexler, Mária, Bányai, László, Patthy, László
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3927609/
https://www.ncbi.nlm.nih.gov/pubmed/24710207
http://dx.doi.org/10.3390/genes2030449
_version_ 1782304152293998592
author Nagy, Alinda
Szláma, György
Szarka, Eszter
Trexler, Mária
Bányai, László
Patthy, László
author_facet Nagy, Alinda
Szláma, György
Szarka, Eszter
Trexler, Mária
Bányai, László
Patthy, László
author_sort Nagy, Alinda
collection PubMed
description In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].
format Online
Article
Text
id pubmed-3927609
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-39276092014-03-26 Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors Nagy, Alinda Szláma, György Szarka, Eszter Trexler, Mária Bányai, László Patthy, László Genes (Basel) Article In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1]. MDPI 2011-07-13 /pmc/articles/PMC3927609/ /pubmed/24710207 http://dx.doi.org/10.3390/genes2030449 Text en © 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Article
Nagy, Alinda
Szláma, György
Szarka, Eszter
Trexler, Mária
Bányai, László
Patthy, László
Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
title Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
title_full Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
title_fullStr Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
title_full_unstemmed Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
title_short Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
title_sort reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3927609/
https://www.ncbi.nlm.nih.gov/pubmed/24710207
http://dx.doi.org/10.3390/genes2030449
work_keys_str_mv AT nagyalinda reassessingdomainarchitectureevolutionofmetazoanproteinsmajorimpactofgenepredictionerrors
AT szlamagyorgy reassessingdomainarchitectureevolutionofmetazoanproteinsmajorimpactofgenepredictionerrors
AT szarkaeszter reassessingdomainarchitectureevolutionofmetazoanproteinsmajorimpactofgenepredictionerrors
AT trexlermaria reassessingdomainarchitectureevolutionofmetazoanproteinsmajorimpactofgenepredictionerrors
AT banyailaszlo reassessingdomainarchitectureevolutionofmetazoanproteinsmajorimpactofgenepredictionerrors
AT patthylaszlo reassessingdomainarchitectureevolutionofmetazoanproteinsmajorimpactofgenepredictionerrors