Cargando…

Prediction of prokaryotic transposases from protein features with machine learning approaches

Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Qian, Ye, Jun, Xu, Teng, Zhou, Ning, Lu, Zhongqiu, Ying, Jianchao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477400/
https://www.ncbi.nlm.nih.gov/pubmed/34309504
http://dx.doi.org/10.1099/mgen.0.000611
_version_ 1784575834673119232
author Wang, Qian
Ye, Jun
Xu, Teng
Zhou, Ning
Lu, Zhongqiu
Ying, Jianchao
author_facet Wang, Qian
Ye, Jun
Xu, Teng
Zhou, Ning
Lu, Zhongqiu
Ying, Jianchao
author_sort Wang, Qian
collection PubMed
description Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future.
format Online
Article
Text
id pubmed-8477400
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-84774002021-09-28 Prediction of prokaryotic transposases from protein features with machine learning approaches Wang, Qian Ye, Jun Xu, Teng Zhou, Ning Lu, Zhongqiu Ying, Jianchao Microb Genom Research Articles Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future. Microbiology Society 2021-07-26 /pmc/articles/PMC8477400/ /pubmed/34309504 http://dx.doi.org/10.1099/mgen.0.000611 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License.
spellingShingle Research Articles
Wang, Qian
Ye, Jun
Xu, Teng
Zhou, Ning
Lu, Zhongqiu
Ying, Jianchao
Prediction of prokaryotic transposases from protein features with machine learning approaches
title Prediction of prokaryotic transposases from protein features with machine learning approaches
title_full Prediction of prokaryotic transposases from protein features with machine learning approaches
title_fullStr Prediction of prokaryotic transposases from protein features with machine learning approaches
title_full_unstemmed Prediction of prokaryotic transposases from protein features with machine learning approaches
title_short Prediction of prokaryotic transposases from protein features with machine learning approaches
title_sort prediction of prokaryotic transposases from protein features with machine learning approaches
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477400/
https://www.ncbi.nlm.nih.gov/pubmed/34309504
http://dx.doi.org/10.1099/mgen.0.000611
work_keys_str_mv AT wangqian predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches
AT yejun predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches
AT xuteng predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches
AT zhouning predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches
AT luzhongqiu predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches
AT yingjianchao predictionofprokaryotictransposasesfromproteinfeatureswithmachinelearningapproaches