Cargando…

Consensus assessment of the contamination level of publicly available cyanobacterial genomes

Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Ye...

Descripción completa

Detalles Bibliográficos
Autores principales: Cornet, Luc, Meunier, Loïc, Van Vlierberghe, Mick, Léonard, Raphaël R., Durieu, Benoit, Lara, Yannick, Misztak, Agnieszka, Sirjacobs, Damien, Javaux, Emmanuelle J., Philippe, Hervé, Wilmotte, Annick, Baurain, Denis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6059444/
https://www.ncbi.nlm.nih.gov/pubmed/30044797
http://dx.doi.org/10.1371/journal.pone.0200323
_version_ 1783341862523764736
author Cornet, Luc
Meunier, Loïc
Van Vlierberghe, Mick
Léonard, Raphaël R.
Durieu, Benoit
Lara, Yannick
Misztak, Agnieszka
Sirjacobs, Damien
Javaux, Emmanuelle J.
Philippe, Hervé
Wilmotte, Annick
Baurain, Denis
author_facet Cornet, Luc
Meunier, Loïc
Van Vlierberghe, Mick
Léonard, Raphaël R.
Durieu, Benoit
Lara, Yannick
Misztak, Agnieszka
Sirjacobs, Damien
Javaux, Emmanuelle J.
Philippe, Hervé
Wilmotte, Annick
Baurain, Denis
author_sort Cornet, Luc
collection PubMed
description Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
format Online
Article
Text
id pubmed-6059444
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60594442018-08-09 Consensus assessment of the contamination level of publicly available cyanobacterial genomes Cornet, Luc Meunier, Loïc Van Vlierberghe, Mick Léonard, Raphaël R. Durieu, Benoit Lara, Yannick Misztak, Agnieszka Sirjacobs, Damien Javaux, Emmanuelle J. Philippe, Hervé Wilmotte, Annick Baurain, Denis PLoS One Research Article Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases. Public Library of Science 2018-07-25 /pmc/articles/PMC6059444/ /pubmed/30044797 http://dx.doi.org/10.1371/journal.pone.0200323 Text en © 2018 Cornet et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Cornet, Luc
Meunier, Loïc
Van Vlierberghe, Mick
Léonard, Raphaël R.
Durieu, Benoit
Lara, Yannick
Misztak, Agnieszka
Sirjacobs, Damien
Javaux, Emmanuelle J.
Philippe, Hervé
Wilmotte, Annick
Baurain, Denis
Consensus assessment of the contamination level of publicly available cyanobacterial genomes
title Consensus assessment of the contamination level of publicly available cyanobacterial genomes
title_full Consensus assessment of the contamination level of publicly available cyanobacterial genomes
title_fullStr Consensus assessment of the contamination level of publicly available cyanobacterial genomes
title_full_unstemmed Consensus assessment of the contamination level of publicly available cyanobacterial genomes
title_short Consensus assessment of the contamination level of publicly available cyanobacterial genomes
title_sort consensus assessment of the contamination level of publicly available cyanobacterial genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6059444/
https://www.ncbi.nlm.nih.gov/pubmed/30044797
http://dx.doi.org/10.1371/journal.pone.0200323
work_keys_str_mv AT cornetluc consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT meunierloic consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT vanvlierberghemick consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT leonardraphaelr consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT durieubenoit consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT larayannick consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT misztakagnieszka consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT sirjacobsdamien consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT javauxemmanuellej consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT philippeherve consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT wilmotteannick consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes
AT bauraindenis consensusassessmentofthecontaminationlevelofpubliclyavailablecyanobacterialgenomes