Cargando…

HoCoRT: host contamination removal tool

BACKGROUND: Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human...

Descripción completa

Detalles Bibliográficos
Autores principales: Rumbavicius, Ignas, Rounge, Trine B., Rognes, Torbjørn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10544359/
https://www.ncbi.nlm.nih.gov/pubmed/37784008
http://dx.doi.org/10.1186/s12859-023-05492-w
_version_ 1785114489230721024
author Rumbavicius, Ignas
Rounge, Trine B.
Rognes, Torbjørn
author_facet Rumbavicius, Ignas
Rounge, Trine B.
Rognes, Torbjørn
author_sort Rumbavicius, Ignas
collection PubMed
description BACKGROUND: Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods. RESULTS: HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads. CONCLUSIONS: To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05492-w.
format Online
Article
Text
id pubmed-10544359
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105443592023-10-03 HoCoRT: host contamination removal tool Rumbavicius, Ignas Rounge, Trine B. Rognes, Torbjørn BMC Bioinformatics Software BACKGROUND: Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods. RESULTS: HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads. CONCLUSIONS: To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05492-w. BioMed Central 2023-10-02 /pmc/articles/PMC10544359/ /pubmed/37784008 http://dx.doi.org/10.1186/s12859-023-05492-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Rumbavicius, Ignas
Rounge, Trine B.
Rognes, Torbjørn
HoCoRT: host contamination removal tool
title HoCoRT: host contamination removal tool
title_full HoCoRT: host contamination removal tool
title_fullStr HoCoRT: host contamination removal tool
title_full_unstemmed HoCoRT: host contamination removal tool
title_short HoCoRT: host contamination removal tool
title_sort hocort: host contamination removal tool
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10544359/
https://www.ncbi.nlm.nih.gov/pubmed/37784008
http://dx.doi.org/10.1186/s12859-023-05492-w
work_keys_str_mv AT rumbaviciusignas hocorthostcontaminationremovaltool
AT roungetrineb hocorthostcontaminationremovaltool
AT rognestorbjørn hocorthostcontaminationremovaltool