Cargando…

Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning

In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many f...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Bohan, Nan, Jun, Zu, Xuehui, Zhang, Xinhui, Xiao, Qiliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848102/
https://www.ncbi.nlm.nih.gov/pubmed/33537313
http://dx.doi.org/10.3389/fcell.2020.626221
_version_ 1783645057826422784
author Liu, Bohan
Nan, Jun
Zu, Xuehui
Zhang, Xinhui
Xiao, Qiliang
author_facet Liu, Bohan
Nan, Jun
Zu, Xuehui
Zhang, Xinhui
Xiao, Qiliang
author_sort Liu, Bohan
collection PubMed
description In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 ± 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability.
format Online
Article
Text
id pubmed-7848102
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78481022021-02-02 Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning Liu, Bohan Nan, Jun Zu, Xuehui Zhang, Xinhui Xiao, Qiliang Front Cell Dev Biol Cell and Developmental Biology In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 ± 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability. Frontiers Media S.A. 2021-01-18 /pmc/articles/PMC7848102/ /pubmed/33537313 http://dx.doi.org/10.3389/fcell.2020.626221 Text en Copyright © 2021 Liu, Nan, Zu, Zhang and Xiao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cell and Developmental Biology
Liu, Bohan
Nan, Jun
Zu, Xuehui
Zhang, Xinhui
Xiao, Qiliang
Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
title Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
title_full Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
title_fullStr Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
title_full_unstemmed Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
title_short Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
title_sort identification of genome sequences of polyphosphate-accumulating organisms by machine learning
topic Cell and Developmental Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848102/
https://www.ncbi.nlm.nih.gov/pubmed/33537313
http://dx.doi.org/10.3389/fcell.2020.626221
work_keys_str_mv AT liubohan identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning
AT nanjun identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning
AT zuxuehui identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning
AT zhangxinhui identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning
AT xiaoqiliang identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning