Cargando…

Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies

In the course of analyzing 9 522 746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Cente...

Descripción completa

Detalles Bibliográficos
Autores principales: Tripp, H. James, Hewson, Ian, Boyarsky, Sam, Stuart, Joshua M., Zehr, Jonathan P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203614/
https://www.ncbi.nlm.nih.gov/pubmed/21771858
http://dx.doi.org/10.1093/nar/gkr576
_version_ 1782215129174114304
author Tripp, H. James
Hewson, Ian
Boyarsky, Sam
Stuart, Joshua M.
Zehr, Jonathan P.
author_facet Tripp, H. James
Hewson, Ian
Boyarsky, Sam
Stuart, Joshua M.
Zehr, Jonathan P.
author_sort Tripp, H. James
collection PubMed
description In the course of analyzing 9 522 746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, ‘Cw-hydrolase’) uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets.
format Online
Article
Text
id pubmed-3203614
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-32036142011-10-28 Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies Tripp, H. James Hewson, Ian Boyarsky, Sam Stuart, Joshua M. Zehr, Jonathan P. Nucleic Acids Res Genomics In the course of analyzing 9 522 746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, ‘Cw-hydrolase’) uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets. Oxford University Press 2011-11 2011-07-19 /pmc/articles/PMC3203614/ /pubmed/21771858 http://dx.doi.org/10.1093/nar/gkr576 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Tripp, H. James
Hewson, Ian
Boyarsky, Sam
Stuart, Joshua M.
Zehr, Jonathan P.
Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
title Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
title_full Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
title_fullStr Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
title_full_unstemmed Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
title_short Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
title_sort misannotations of rrna can now generate 90% false positive protein matches in metatranscriptomic studies
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203614/
https://www.ncbi.nlm.nih.gov/pubmed/21771858
http://dx.doi.org/10.1093/nar/gkr576
work_keys_str_mv AT tripphjames misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies
AT hewsonian misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies
AT boyarskysam misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies
AT stuartjoshuam misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies
AT zehrjonathanp misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies