Cargando…

Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data

Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Haiyong, Wang, Shuang, Yuan, Xiguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7734255/
https://www.ncbi.nlm.nih.gov/pubmed/33329748
http://dx.doi.org/10.3389/fgene.2020.603093
_version_ 1783622435792224256
author Zhao, Haiyong
Wang, Shuang
Yuan, Xiguo
author_facet Zhao, Haiyong
Wang, Shuang
Yuan, Xiguo
author_sort Zhao, Haiyong
collection PubMed
description Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the existence of sequencing errors pose various challenges. Numerous methods have been developed for quantifying microbial composition and abundance, but they are not versatile enough for the analysis of samples with mixtures of noise. In this paper, we propose a new computational method, PGMicroD, for the detection of pathogenic microbial composition in a sample using NGS data. The method first filters the potentially mistakenly mapped reads and extracts multiple species-related features from the sequencing reads of 16S rRNA. Then it trains an Support Vector Machine classifier to predict the microbial composition. Finally, it groups all multiple-mapped sequencing reads into the references of the predicted species to estimate the abundance for each kind of species. The performance of PGMicroD is evaluated based on both simulation and real sequencing data and is compared with several existing methods. The results demonstrate that our proposed method achieves superior performance. The software package of PGMicroD is available at https://github.com/BDanalysis/PGMicroD.
format Online
Article
Text
id pubmed-7734255
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77342552020-12-15 Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data Zhao, Haiyong Wang, Shuang Yuan, Xiguo Front Genet Genetics Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the existence of sequencing errors pose various challenges. Numerous methods have been developed for quantifying microbial composition and abundance, but they are not versatile enough for the analysis of samples with mixtures of noise. In this paper, we propose a new computational method, PGMicroD, for the detection of pathogenic microbial composition in a sample using NGS data. The method first filters the potentially mistakenly mapped reads and extracts multiple species-related features from the sequencing reads of 16S rRNA. Then it trains an Support Vector Machine classifier to predict the microbial composition. Finally, it groups all multiple-mapped sequencing reads into the references of the predicted species to estimate the abundance for each kind of species. The performance of PGMicroD is evaluated based on both simulation and real sequencing data and is compared with several existing methods. The results demonstrate that our proposed method achieves superior performance. The software package of PGMicroD is available at https://github.com/BDanalysis/PGMicroD. Frontiers Media S.A. 2020-11-30 /pmc/articles/PMC7734255/ /pubmed/33329748 http://dx.doi.org/10.3389/fgene.2020.603093 Text en Copyright © 2020 Zhao, Wang and Yuan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhao, Haiyong
Wang, Shuang
Yuan, Xiguo
Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data
title Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data
title_full Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data
title_fullStr Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data
title_full_unstemmed Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data
title_short Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data
title_sort detection of pathogenic microbe composition using next-generation sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7734255/
https://www.ncbi.nlm.nih.gov/pubmed/33329748
http://dx.doi.org/10.3389/fgene.2020.603093
work_keys_str_mv AT zhaohaiyong detectionofpathogenicmicrobecompositionusingnextgenerationsequencingdata
AT wangshuang detectionofpathogenicmicrobecompositionusingnextgenerationsequencingdata
AT yuanxiguo detectionofpathogenicmicrobecompositionusingnextgenerationsequencingdata