Cargando…

Most partial domains in proteins are alignment and annotation artifacts

BACKGROUND: Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Triant, Deborah A, Pearson, William R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4443539/
https://www.ncbi.nlm.nih.gov/pubmed/25976240
http://dx.doi.org/10.1186/s13059-015-0656-7
_version_ 1782373004672499712
author Triant, Deborah A
Pearson, William R
author_facet Triant, Deborah A
Pearson, William R
author_sort Triant, Deborah A
collection PubMed
description BACKGROUND: Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2). RESULTS: We characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts. CONCLUSIONS: Partial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein’s gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0656-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4443539
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44435392015-05-27 Most partial domains in proteins are alignment and annotation artifacts Triant, Deborah A Pearson, William R Genome Biol Research BACKGROUND: Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2). RESULTS: We characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts. CONCLUSIONS: Partial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein’s gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0656-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-15 2015 /pmc/articles/PMC4443539/ /pubmed/25976240 http://dx.doi.org/10.1186/s13059-015-0656-7 Text en © Triant and Pearson; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Triant, Deborah A
Pearson, William R
Most partial domains in proteins are alignment and annotation artifacts
title Most partial domains in proteins are alignment and annotation artifacts
title_full Most partial domains in proteins are alignment and annotation artifacts
title_fullStr Most partial domains in proteins are alignment and annotation artifacts
title_full_unstemmed Most partial domains in proteins are alignment and annotation artifacts
title_short Most partial domains in proteins are alignment and annotation artifacts
title_sort most partial domains in proteins are alignment and annotation artifacts
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4443539/
https://www.ncbi.nlm.nih.gov/pubmed/25976240
http://dx.doi.org/10.1186/s13059-015-0656-7
work_keys_str_mv AT triantdeboraha mostpartialdomainsinproteinsarealignmentandannotationartifacts
AT pearsonwilliamr mostpartialdomainsinproteinsarealignmentandannotationartifacts