Cargando…

A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data

BACKGROUND: Multiple RNA samples are frequently processed together and often mixed before multiplex sequencing in the same sequencing run. While different samples can be separated post sequencing using sample barcodes, the possibility of cross contamination between biological samples from different...

Descripción completa

Detalles Bibliográficos
Autores principales: Simion, Paul, Belkhir, Khalid, François, Clémentine, Veyssier, Julien, Rink, Jochen C., Manuel, Michaël, Philippe, Hervé, Telford, Maximilian J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5838952/
https://www.ncbi.nlm.nih.gov/pubmed/29506533
http://dx.doi.org/10.1186/s12915-018-0486-7
_version_ 1783304338725142528
author Simion, Paul
Belkhir, Khalid
François, Clémentine
Veyssier, Julien
Rink, Jochen C.
Manuel, Michaël
Philippe, Hervé
Telford, Maximilian J.
author_facet Simion, Paul
Belkhir, Khalid
François, Clémentine
Veyssier, Julien
Rink, Jochen C.
Manuel, Michaël
Philippe, Hervé
Telford, Maximilian J.
author_sort Simion, Paul
collection PubMed
description BACKGROUND: Multiple RNA samples are frequently processed together and often mixed before multiplex sequencing in the same sequencing run. While different samples can be separated post sequencing using sample barcodes, the possibility of cross contamination between biological samples from different species that have been processed or sequenced in parallel has the potential to be extremely deleterious for downstream analyses. RESULTS: We present CroCo, a software package for identifying and removing such cross contaminants from assembled transcriptomes. Using multiple, recently published sequence datasets, we show that cross contamination is consistently present at varying levels in real data. Using real and simulated data, we demonstrate that CroCo detects contaminants efficiently and correctly. Using a real example from a molecular phylogenetic dataset, we show that contaminants, if not eliminated, can have a decisive, deleterious impact on downstream comparative analyses. CONCLUSIONS: Cross contamination is pervasive in new and published datasets and, if undetected, can have serious deleterious effects on downstream analyses. CroCo is a database-independent, multi-platform tool, designed for ease of use, that efficiently and accurately detects and removes cross contamination in assembled transcriptomes to avoid these problems. We suggest that the use of CroCo should become a standard cleaning step when processing multiple samples for transcriptome sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12915-018-0486-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5838952
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58389522018-03-09 A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data Simion, Paul Belkhir, Khalid François, Clémentine Veyssier, Julien Rink, Jochen C. Manuel, Michaël Philippe, Hervé Telford, Maximilian J. BMC Biol Software BACKGROUND: Multiple RNA samples are frequently processed together and often mixed before multiplex sequencing in the same sequencing run. While different samples can be separated post sequencing using sample barcodes, the possibility of cross contamination between biological samples from different species that have been processed or sequenced in parallel has the potential to be extremely deleterious for downstream analyses. RESULTS: We present CroCo, a software package for identifying and removing such cross contaminants from assembled transcriptomes. Using multiple, recently published sequence datasets, we show that cross contamination is consistently present at varying levels in real data. Using real and simulated data, we demonstrate that CroCo detects contaminants efficiently and correctly. Using a real example from a molecular phylogenetic dataset, we show that contaminants, if not eliminated, can have a decisive, deleterious impact on downstream comparative analyses. CONCLUSIONS: Cross contamination is pervasive in new and published datasets and, if undetected, can have serious deleterious effects on downstream analyses. CroCo is a database-independent, multi-platform tool, designed for ease of use, that efficiently and accurately detects and removes cross contamination in assembled transcriptomes to avoid these problems. We suggest that the use of CroCo should become a standard cleaning step when processing multiple samples for transcriptome sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12915-018-0486-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-03-05 /pmc/articles/PMC5838952/ /pubmed/29506533 http://dx.doi.org/10.1186/s12915-018-0486-7 Text en © Telford et al. 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Simion, Paul
Belkhir, Khalid
François, Clémentine
Veyssier, Julien
Rink, Jochen C.
Manuel, Michaël
Philippe, Hervé
Telford, Maximilian J.
A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
title A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
title_full A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
title_fullStr A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
title_full_unstemmed A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
title_short A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
title_sort software tool ‘croco’ detects pervasive cross-species contamination in next generation sequencing data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5838952/
https://www.ncbi.nlm.nih.gov/pubmed/29506533
http://dx.doi.org/10.1186/s12915-018-0486-7
work_keys_str_mv AT simionpaul asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT belkhirkhalid asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT francoisclementine asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT veyssierjulien asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT rinkjochenc asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT manuelmichael asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT philippeherve asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT telfordmaximilianj asoftwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT simionpaul softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT belkhirkhalid softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT francoisclementine softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT veyssierjulien softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT rinkjochenc softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT manuelmichael softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT philippeherve softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata
AT telfordmaximilianj softwaretoolcrocodetectspervasivecrossspeciescontaminationinnextgenerationsequencingdata