Cargando…

Cont-ID: detection of sample cross-contamination in viral metagenomic data

BACKGROUND: High-throughput sequencing (HTS) technologies completed by the bioinformatic analysis of the generated data are becoming an important detection technique for virus diagnostics. They have the potential to replace or complement the current PCR-based methods thanks to their improved inclusi...

Descripción completa

Detalles Bibliográficos
Autores principales: Rollin, Johan, Rong, Wei, Massart, Sébastien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576407/
https://www.ncbi.nlm.nih.gov/pubmed/37833740
http://dx.doi.org/10.1186/s12915-023-01708-w
_version_ 1785121116850749440
author Rollin, Johan
Rong, Wei
Massart, Sébastien
author_facet Rollin, Johan
Rong, Wei
Massart, Sébastien
author_sort Rollin, Johan
collection PubMed
description BACKGROUND: High-throughput sequencing (HTS) technologies completed by the bioinformatic analysis of the generated data are becoming an important detection technique for virus diagnostics. They have the potential to replace or complement the current PCR-based methods thanks to their improved inclusivity and analytical sensitivity, as well as their overall good repeatability and reproducibility. Cross-contamination is a well-known phenomenon in molecular diagnostics and corresponds to the exchange of genetic material between samples. Cross-contamination management was a key drawback during the development of PCR-based detection and is now adequately monitored in routine diagnostics. HTS technologies are facing similar difficulties due to their very high analytical sensitivity. As a single viral read could be detected in millions of sequencing reads, it is mandatory to fix a detection threshold that will be informed by estimated cross-contamination. Cross-contamination monitoring should therefore be a priority when detecting viruses by HTS technologies. RESULTS: We present Cont-ID, a bioinformatic tool designed to check for cross-contamination by analysing the relative abundance of virus sequencing reads identified in sequence metagenomic datasets and their duplication between samples. It can be applied when the samples in a sequencing batch have been processed in parallel in the laboratory and with at least one specific external control called Alien control. Using 273 real datasets, including 68 virus species from different hosts (fruit tree, plant, human) and several library preparation protocols (Ribodepleted total RNA, small RNA and double-stranded RNA), we demonstrated that Cont-ID classifies with high accuracy (91%) viral species detection into (true) infection or (cross) contamination. This classification raises confidence in the detection and facilitates the downstream interpretation and confirmation of the results by prioritising the virus detections that should be confirmed. CONCLUSIONS: Cross-contamination between samples when detecting viruses using HTS (Illumina technology) can be monitored and highlighted by Cont-ID (provided an alien control is present). Cont-ID is based on a flexible methodology relying on the output of bioinformatics analyses of the sequencing reads and considering the contamination pattern specific to each batch of samples. The Cont-ID method is adaptable so that each laboratory can optimise it before its validation and routine use. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12915-023-01708-w.
format Online
Article
Text
id pubmed-10576407
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105764072023-10-15 Cont-ID: detection of sample cross-contamination in viral metagenomic data Rollin, Johan Rong, Wei Massart, Sébastien BMC Biol Software BACKGROUND: High-throughput sequencing (HTS) technologies completed by the bioinformatic analysis of the generated data are becoming an important detection technique for virus diagnostics. They have the potential to replace or complement the current PCR-based methods thanks to their improved inclusivity and analytical sensitivity, as well as their overall good repeatability and reproducibility. Cross-contamination is a well-known phenomenon in molecular diagnostics and corresponds to the exchange of genetic material between samples. Cross-contamination management was a key drawback during the development of PCR-based detection and is now adequately monitored in routine diagnostics. HTS technologies are facing similar difficulties due to their very high analytical sensitivity. As a single viral read could be detected in millions of sequencing reads, it is mandatory to fix a detection threshold that will be informed by estimated cross-contamination. Cross-contamination monitoring should therefore be a priority when detecting viruses by HTS technologies. RESULTS: We present Cont-ID, a bioinformatic tool designed to check for cross-contamination by analysing the relative abundance of virus sequencing reads identified in sequence metagenomic datasets and their duplication between samples. It can be applied when the samples in a sequencing batch have been processed in parallel in the laboratory and with at least one specific external control called Alien control. Using 273 real datasets, including 68 virus species from different hosts (fruit tree, plant, human) and several library preparation protocols (Ribodepleted total RNA, small RNA and double-stranded RNA), we demonstrated that Cont-ID classifies with high accuracy (91%) viral species detection into (true) infection or (cross) contamination. This classification raises confidence in the detection and facilitates the downstream interpretation and confirmation of the results by prioritising the virus detections that should be confirmed. CONCLUSIONS: Cross-contamination between samples when detecting viruses using HTS (Illumina technology) can be monitored and highlighted by Cont-ID (provided an alien control is present). Cont-ID is based on a flexible methodology relying on the output of bioinformatics analyses of the sequencing reads and considering the contamination pattern specific to each batch of samples. The Cont-ID method is adaptable so that each laboratory can optimise it before its validation and routine use. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12915-023-01708-w. BioMed Central 2023-10-13 /pmc/articles/PMC10576407/ /pubmed/37833740 http://dx.doi.org/10.1186/s12915-023-01708-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Rollin, Johan
Rong, Wei
Massart, Sébastien
Cont-ID: detection of sample cross-contamination in viral metagenomic data
title Cont-ID: detection of sample cross-contamination in viral metagenomic data
title_full Cont-ID: detection of sample cross-contamination in viral metagenomic data
title_fullStr Cont-ID: detection of sample cross-contamination in viral metagenomic data
title_full_unstemmed Cont-ID: detection of sample cross-contamination in viral metagenomic data
title_short Cont-ID: detection of sample cross-contamination in viral metagenomic data
title_sort cont-id: detection of sample cross-contamination in viral metagenomic data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576407/
https://www.ncbi.nlm.nih.gov/pubmed/37833740
http://dx.doi.org/10.1186/s12915-023-01708-w
work_keys_str_mv AT rollinjohan contiddetectionofsamplecrosscontaminationinviralmetagenomicdata
AT rongwei contiddetectionofsamplecrosscontaminationinviralmetagenomicdata
AT massartsebastien contiddetectionofsamplecrosscontaminationinviralmetagenomicdata