Cargando…
Prediction of prokaryotic transposases from protein features with machine learning approaches
Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Microbiology Society
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477400/ https://www.ncbi.nlm.nih.gov/pubmed/34309504 http://dx.doi.org/10.1099/mgen.0.000611 |
_version_ | 1784575834673119232 |
---|---|
author | Wang, Qian Ye, Jun Xu, Teng Zhou, Ning Lu, Zhongqiu Ying, Jianchao |
author_facet | Wang, Qian Ye, Jun Xu, Teng Zhou, Ning Lu, Zhongqiu Ying, Jianchao |
author_sort | Wang, Qian |
collection | PubMed |
description | Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future. |
format | Online Article Text |
id | pubmed-8477400 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Microbiology Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-84774002021-09-28 Prediction of prokaryotic transposases from protein features with machine learning approaches Wang, Qian Ye, Jun Xu, Teng Zhou, Ning Lu, Zhongqiu Ying, Jianchao Microb Genom Research Articles Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future. Microbiology Society 2021-07-26 /pmc/articles/PMC8477400/ /pubmed/34309504 http://dx.doi.org/10.1099/mgen.0.000611 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License. |
spellingShingle | Research Articles Wang, Qian Ye, Jun Xu, Teng Zhou, Ning Lu, Zhongqiu Ying, Jianchao Prediction of prokaryotic transposases from protein features with machine learning approaches |
title | Prediction of prokaryotic transposases from protein features with machine learning approaches |
title_full | Prediction of prokaryotic transposases from protein features with machine learning approaches |
title_fullStr | Prediction of prokaryotic transposases from protein features with machine learning approaches |
title_full_unstemmed | Prediction of prokaryotic transposases from protein features with machine learning approaches |
title_short | Prediction of prokaryotic transposases from protein features with machine learning approaches |
title_sort | prediction of prokaryotic transposases from protein features with machine learning approaches |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477400/ https://www.ncbi.nlm.nih.gov/pubmed/34309504 http://dx.doi.org/10.1099/mgen.0.000611 |
work_keys_str_mv | AT wangqian predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches AT yejun predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches AT xuteng predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches AT zhouning predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches AT luzhongqiu predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches AT yingjianchao predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches |