Cargando…
Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies
In the course of analyzing 9 522 746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Cente...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203614/ https://www.ncbi.nlm.nih.gov/pubmed/21771858 http://dx.doi.org/10.1093/nar/gkr576 |
_version_ | 1782215129174114304 |
---|---|
author | Tripp, H. James Hewson, Ian Boyarsky, Sam Stuart, Joshua M. Zehr, Jonathan P. |
author_facet | Tripp, H. James Hewson, Ian Boyarsky, Sam Stuart, Joshua M. Zehr, Jonathan P. |
author_sort | Tripp, H. James |
collection | PubMed |
description | In the course of analyzing 9 522 746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, ‘Cw-hydrolase’) uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets. |
format | Online Article Text |
id | pubmed-3203614 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-32036142011-10-28 Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies Tripp, H. James Hewson, Ian Boyarsky, Sam Stuart, Joshua M. Zehr, Jonathan P. Nucleic Acids Res Genomics In the course of analyzing 9 522 746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, ‘Cw-hydrolase’) uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets. Oxford University Press 2011-11 2011-07-19 /pmc/articles/PMC3203614/ /pubmed/21771858 http://dx.doi.org/10.1093/nar/gkr576 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genomics Tripp, H. James Hewson, Ian Boyarsky, Sam Stuart, Joshua M. Zehr, Jonathan P. Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies |
title | Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies |
title_full | Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies |
title_fullStr | Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies |
title_full_unstemmed | Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies |
title_short | Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies |
title_sort | misannotations of rrna can now generate 90% false positive protein matches in metatranscriptomic studies |
topic | Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203614/ https://www.ncbi.nlm.nih.gov/pubmed/21771858 http://dx.doi.org/10.1093/nar/gkr576 |
work_keys_str_mv | AT tripphjames misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies AT hewsonian misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies AT boyarskysam misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies AT stuartjoshuam misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies AT zehrjonathanp misannotationsofrrnacannowgenerate90falsepositiveproteinmatchesinmetatranscriptomicstudies |