Cargando…

Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes

Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly suscep...

Descripción completa

Detalles Bibliográficos
Autores principales: Robinson, Aaron J., Daligault, Hajnalka E., Kelliher, Julia M., LeBrun, Erick S., Chain, Patrick S. G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8793683/
https://www.ncbi.nlm.nih.gov/pubmed/35096026
http://dx.doi.org/10.3389/fgene.2021.821715
_version_ 1784640653591838720
author Robinson, Aaron J.
Daligault, Hajnalka E.
Kelliher, Julia M.
LeBrun, Erick S.
Chain, Patrick S. G.
author_facet Robinson, Aaron J.
Daligault, Hajnalka E.
Kelliher, Julia M.
LeBrun, Erick S.
Chain, Patrick S. G.
author_sort Robinson, Aaron J.
collection PubMed
description Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted ‘raw’ shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future.
format Online
Article
Text
id pubmed-8793683
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-87936832022-01-28 Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes Robinson, Aaron J. Daligault, Hajnalka E. Kelliher, Julia M. LeBrun, Erick S. Chain, Patrick S. G. Front Genet Genetics Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted ‘raw’ shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future. Frontiers Media S.A. 2022-01-13 /pmc/articles/PMC8793683/ /pubmed/35096026 http://dx.doi.org/10.3389/fgene.2021.821715 Text en Copyright © 2022 Robinson, Daligault, Kelliher, LeBrun and Chain. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Robinson, Aaron J.
Daligault, Hajnalka E.
Kelliher, Julia M.
LeBrun, Erick S.
Chain, Patrick S. G.
Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
title Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
title_full Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
title_fullStr Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
title_full_unstemmed Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
title_short Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
title_sort multiple cases of bacterial sequence erroneously incorporated into publicly available chloroplast genomes
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8793683/
https://www.ncbi.nlm.nih.gov/pubmed/35096026
http://dx.doi.org/10.3389/fgene.2021.821715
work_keys_str_mv AT robinsonaaronj multiplecasesofbacterialsequenceerroneouslyincorporatedintopubliclyavailablechloroplastgenomes
AT daligaulthajnalkae multiplecasesofbacterialsequenceerroneouslyincorporatedintopubliclyavailablechloroplastgenomes
AT kelliherjuliam multiplecasesofbacterialsequenceerroneouslyincorporatedintopubliclyavailablechloroplastgenomes
AT lebrunericks multiplecasesofbacterialsequenceerroneouslyincorporatedintopubliclyavailablechloroplastgenomes
AT chainpatricksg multiplecasesofbacterialsequenceerroneouslyincorporatedintopubliclyavailablechloroplastgenomes