Cargando…

AntiFam: a tool to help identify spurious ORFs in protein annotation

As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and anno...

Descripción completa

Detalles Bibliográficos
Autores principales: Eberhardt, Ruth Y., Haft, Daniel H., Punta, Marco, Martin, Maria, O'Donovan, Claire, Bateman, Alex
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308159/
https://www.ncbi.nlm.nih.gov/pubmed/22434837
http://dx.doi.org/10.1093/database/bas003
_version_ 1782227404820840448
author Eberhardt, Ruth Y.
Haft, Daniel H.
Punta, Marco
Martin, Maria
O'Donovan, Claire
Bateman, Alex
author_facet Eberhardt, Ruth Y.
Haft, Daniel H.
Punta, Marco
Martin, Maria
O'Donovan, Claire
Bateman, Alex
author_sort Eberhardt, Ruth Y.
collection PubMed
description As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.
format Online
Article
Text
id pubmed-3308159
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33081592012-03-20 AntiFam: a tool to help identify spurious ORFs in protein annotation Eberhardt, Ruth Y. Haft, Daniel H. Punta, Marco Martin, Maria O'Donovan, Claire Bateman, Alex Database (Oxford) Original Articles As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion. Oxford University Press 2012-03-13 /pmc/articles/PMC3308159/ /pubmed/22434837 http://dx.doi.org/10.1093/database/bas003 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Eberhardt, Ruth Y.
Haft, Daniel H.
Punta, Marco
Martin, Maria
O'Donovan, Claire
Bateman, Alex
AntiFam: a tool to help identify spurious ORFs in protein annotation
title AntiFam: a tool to help identify spurious ORFs in protein annotation
title_full AntiFam: a tool to help identify spurious ORFs in protein annotation
title_fullStr AntiFam: a tool to help identify spurious ORFs in protein annotation
title_full_unstemmed AntiFam: a tool to help identify spurious ORFs in protein annotation
title_short AntiFam: a tool to help identify spurious ORFs in protein annotation
title_sort antifam: a tool to help identify spurious orfs in protein annotation
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308159/
https://www.ncbi.nlm.nih.gov/pubmed/22434837
http://dx.doi.org/10.1093/database/bas003
work_keys_str_mv AT eberhardtruthy antifamatooltohelpidentifyspuriousorfsinproteinannotation
AT haftdanielh antifamatooltohelpidentifyspuriousorfsinproteinannotation
AT puntamarco antifamatooltohelpidentifyspuriousorfsinproteinannotation
AT martinmaria antifamatooltohelpidentifyspuriousorfsinproteinannotation
AT odonovanclaire antifamatooltohelpidentifyspuriousorfsinproteinannotation
AT batemanalex antifamatooltohelpidentifyspuriousorfsinproteinannotation