Cargando…
Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning
In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many f...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848102/ https://www.ncbi.nlm.nih.gov/pubmed/33537313 http://dx.doi.org/10.3389/fcell.2020.626221 |
_version_ | 1783645057826422784 |
---|---|
author | Liu, Bohan Nan, Jun Zu, Xuehui Zhang, Xinhui Xiao, Qiliang |
author_facet | Liu, Bohan Nan, Jun Zu, Xuehui Zhang, Xinhui Xiao, Qiliang |
author_sort | Liu, Bohan |
collection | PubMed |
description | In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 ± 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability. |
format | Online Article Text |
id | pubmed-7848102 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78481022021-02-02 Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning Liu, Bohan Nan, Jun Zu, Xuehui Zhang, Xinhui Xiao, Qiliang Front Cell Dev Biol Cell and Developmental Biology In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 ± 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability. Frontiers Media S.A. 2021-01-18 /pmc/articles/PMC7848102/ /pubmed/33537313 http://dx.doi.org/10.3389/fcell.2020.626221 Text en Copyright © 2021 Liu, Nan, Zu, Zhang and Xiao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Cell and Developmental Biology Liu, Bohan Nan, Jun Zu, Xuehui Zhang, Xinhui Xiao, Qiliang Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning |
title | Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning |
title_full | Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning |
title_fullStr | Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning |
title_full_unstemmed | Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning |
title_short | Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning |
title_sort | identification of genome sequences of polyphosphate-accumulating organisms by machine learning |
topic | Cell and Developmental Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848102/ https://www.ncbi.nlm.nih.gov/pubmed/33537313 http://dx.doi.org/10.3389/fcell.2020.626221 |
work_keys_str_mv | AT liubohan identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning AT nanjun identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning AT zuxuehui identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning AT zhangxinhui identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning AT xiaoqiliang identificationofgenomesequencesofpolyphosphateaccumulatingorganismsbymachinelearning |