Cargando…

ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data

Whole-genome sequencing (WGS) of bacterial pathogens is currently widely used to support public-health investigations. The ability to assess WGS data quality is critical to underpin the reliability of downstream analyses. Sequence contamination is a quality issue that could potentially impact WGS-ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Low, Andrew J., Koziol, Adam G., Manninger, Paul A., Blais, Burton, Carrillo, Catherine D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546082/
https://www.ncbi.nlm.nih.gov/pubmed/31183253
http://dx.doi.org/10.7717/peerj.6995
_version_ 1783423495799046144
author Low, Andrew J.
Koziol, Adam G.
Manninger, Paul A.
Blais, Burton
Carrillo, Catherine D.
author_facet Low, Andrew J.
Koziol, Adam G.
Manninger, Paul A.
Blais, Burton
Carrillo, Catherine D.
author_sort Low, Andrew J.
collection PubMed
description Whole-genome sequencing (WGS) of bacterial pathogens is currently widely used to support public-health investigations. The ability to assess WGS data quality is critical to underpin the reliability of downstream analyses. Sequence contamination is a quality issue that could potentially impact WGS-based findings; however, existing tools do not readily identify contamination from closely-related organisms. To address this gap, we have developed a computational pipeline, ConFindr, for detection of intraspecies contamination. ConFindr determines the presence of contaminating sequences based on the identification of multiple alleles of core, single-copy, ribosomal-protein genes in raw sequencing reads. The performance of this tool was assessed using simulated and lab-generated Illumina short-read WGS data with varying levels of contamination (0–20% of reads) and varying genetic distance between the designated target and contaminant strains. Intraspecies and cross-species contamination was reliably detected in datasets containing 5% or more reads from a second, unrelated strain. ConFindr detected intraspecies contamination with higher sensitivity than existing tools, while also being able to automatically detect cross-species contamination with similar sensitivity. The implementation of ConFindr in quality-control pipelines will help to improve the reliability of WGS databases as well as the accuracy of downstream analyses. ConFindr is written in Python, and is freely available under the MIT License at github.com/OLC-Bioinformatics/ConFindr.
format Online
Article
Text
id pubmed-6546082
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-65460822019-06-10 ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data Low, Andrew J. Koziol, Adam G. Manninger, Paul A. Blais, Burton Carrillo, Catherine D. PeerJ Bioinformatics Whole-genome sequencing (WGS) of bacterial pathogens is currently widely used to support public-health investigations. The ability to assess WGS data quality is critical to underpin the reliability of downstream analyses. Sequence contamination is a quality issue that could potentially impact WGS-based findings; however, existing tools do not readily identify contamination from closely-related organisms. To address this gap, we have developed a computational pipeline, ConFindr, for detection of intraspecies contamination. ConFindr determines the presence of contaminating sequences based on the identification of multiple alleles of core, single-copy, ribosomal-protein genes in raw sequencing reads. The performance of this tool was assessed using simulated and lab-generated Illumina short-read WGS data with varying levels of contamination (0–20% of reads) and varying genetic distance between the designated target and contaminant strains. Intraspecies and cross-species contamination was reliably detected in datasets containing 5% or more reads from a second, unrelated strain. ConFindr detected intraspecies contamination with higher sensitivity than existing tools, while also being able to automatically detect cross-species contamination with similar sensitivity. The implementation of ConFindr in quality-control pipelines will help to improve the reliability of WGS databases as well as the accuracy of downstream analyses. ConFindr is written in Python, and is freely available under the MIT License at github.com/OLC-Bioinformatics/ConFindr. PeerJ Inc. 2019-05-31 /pmc/articles/PMC6546082/ /pubmed/31183253 http://dx.doi.org/10.7717/peerj.6995 Text en ©2019 Low et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Low, Andrew J.
Koziol, Adam G.
Manninger, Paul A.
Blais, Burton
Carrillo, Catherine D.
ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
title ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
title_full ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
title_fullStr ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
title_full_unstemmed ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
title_short ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
title_sort confindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546082/
https://www.ncbi.nlm.nih.gov/pubmed/31183253
http://dx.doi.org/10.7717/peerj.6995
work_keys_str_mv AT lowandrewj confindrrapiddetectionofintraspeciesandcrossspeciescontaminationinbacterialwholegenomesequencedata
AT kozioladamg confindrrapiddetectionofintraspeciesandcrossspeciescontaminationinbacterialwholegenomesequencedata
AT manningerpaula confindrrapiddetectionofintraspeciesandcrossspeciescontaminationinbacterialwholegenomesequencedata
AT blaisburton confindrrapiddetectionofintraspeciesandcrossspeciescontaminationinbacterialwholegenomesequencedata
AT carrillocatherined confindrrapiddetectionofintraspeciesandcrossspeciescontaminationinbacterialwholegenomesequencedata