Cargando…
Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights
SIMPLE SUMMARY: Infectious bacteria (microbes) are able to evolve to become resistant to antibiotics (develop antimicrobial resistance, or AMR). Resistant microbes are harder to treat, requiring higher doses, or alternative medications, which can be more toxic. Because of inappropriate use of medici...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694136/ https://www.ncbi.nlm.nih.gov/pubmed/33126516 http://dx.doi.org/10.3390/biology9110365 |
_version_ | 1783614907407663104 |
---|---|
author | ValizadehAslani, Taha Zhao, Zhengqiao Sokhansanj, Bahrad A. Rosen, Gail L. |
author_facet | ValizadehAslani, Taha Zhao, Zhengqiao Sokhansanj, Bahrad A. Rosen, Gail L. |
author_sort | ValizadehAslani, Taha |
collection | PubMed |
description | SIMPLE SUMMARY: Infectious bacteria (microbes) are able to evolve to become resistant to antibiotics (develop antimicrobial resistance, or AMR). Resistant microbes are harder to treat, requiring higher doses, or alternative medications, which can be more toxic. Because of inappropriate use of medicine, microbes are being subjected to evolutionary pressure resulting in increased AMR development. As a result, AMR is emerging one of the biggest public health challenges of our time—posing the risk of a pandemic without effective treatment or vaccine. The goals of this paper are to develop and analyze machine learning methods to use the genome sequence information of a bacterium to: (1) predict the minimum required dose of an antibiotic to treat bacterial infection, and, (2) identify specific mutations or altered genetic content give rise to AMR. In particular, we propose a novel method to apply machine learning algorithms to learn patterns of amino acid sequences in the genes of the bacteria. We show that our proposed method produces comparable or even more accurate results when compared to existing methods for the goal of dose prediction, and it can provide additional insight for scientists who study AMR mechanisms. ABSTRACT: Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately. |
format | Online Article Text |
id | pubmed-7694136 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-76941362020-11-28 Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights ValizadehAslani, Taha Zhao, Zhengqiao Sokhansanj, Bahrad A. Rosen, Gail L. Biology (Basel) Article SIMPLE SUMMARY: Infectious bacteria (microbes) are able to evolve to become resistant to antibiotics (develop antimicrobial resistance, or AMR). Resistant microbes are harder to treat, requiring higher doses, or alternative medications, which can be more toxic. Because of inappropriate use of medicine, microbes are being subjected to evolutionary pressure resulting in increased AMR development. As a result, AMR is emerging one of the biggest public health challenges of our time—posing the risk of a pandemic without effective treatment or vaccine. The goals of this paper are to develop and analyze machine learning methods to use the genome sequence information of a bacterium to: (1) predict the minimum required dose of an antibiotic to treat bacterial infection, and, (2) identify specific mutations or altered genetic content give rise to AMR. In particular, we propose a novel method to apply machine learning algorithms to learn patterns of amino acid sequences in the genes of the bacteria. We show that our proposed method produces comparable or even more accurate results when compared to existing methods for the goal of dose prediction, and it can provide additional insight for scientists who study AMR mechanisms. ABSTRACT: Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately. MDPI 2020-10-28 /pmc/articles/PMC7694136/ /pubmed/33126516 http://dx.doi.org/10.3390/biology9110365 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article ValizadehAslani, Taha Zhao, Zhengqiao Sokhansanj, Bahrad A. Rosen, Gail L. Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights |
title | Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights |
title_full | Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights |
title_fullStr | Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights |
title_full_unstemmed | Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights |
title_short | Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights |
title_sort | amino acid k-mer feature extraction for quantitative antimicrobial resistance (amr) prediction by machine learning and model interpretation for biological insights |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694136/ https://www.ncbi.nlm.nih.gov/pubmed/33126516 http://dx.doi.org/10.3390/biology9110365 |
work_keys_str_mv | AT valizadehaslanitaha aminoacidkmerfeatureextractionforquantitativeantimicrobialresistanceamrpredictionbymachinelearningandmodelinterpretationforbiologicalinsights AT zhaozhengqiao aminoacidkmerfeatureextractionforquantitativeantimicrobialresistanceamrpredictionbymachinelearningandmodelinterpretationforbiologicalinsights AT sokhansanjbahrada aminoacidkmerfeatureextractionforquantitativeantimicrobialresistanceamrpredictionbymachinelearningandmodelinterpretationforbiologicalinsights AT rosengaill aminoacidkmerfeatureextractionforquantitativeantimicrobialresistanceamrpredictionbymachinelearningandmodelinterpretationforbiologicalinsights |