Cargando…
Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences
BACKGROUND: There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effect...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929266/ https://www.ncbi.nlm.nih.gov/pubmed/31874636 http://dx.doi.org/10.1186/s12864-019-6304-y |
_version_ | 1783482664554070016 |
---|---|
author | Bustamam, Alhadi Musti, Mohamad I. S. Hartomo, Susilo Aprilia, Shirley Tampubolon, Patuan P. Lestari, Dian |
author_facet | Bustamam, Alhadi Musti, Mohamad I. S. Hartomo, Susilo Aprilia, Shirley Tampubolon, Patuan P. Lestari, Dian |
author_sort | Bustamam, Alhadi |
collection | PubMed |
description | BACKGROUND: There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effective feature extraction methods can lead to improved model performance. In this study, we used two types of feature extraction methods—global encoding and pseudo-substitution matrix representation (PseudoSMR)—to represent the sequences of amino acids in human proteins and Human Immunodeficiency Virus type 1 (HIV-1) to address the classification problem of predicting protein-protein interactions. We also compared principal component analysis (PCA) with independent principal component analysis (IPCA) as methods for transforming Rotation Forest. RESULTS: The results show that using global encoding and PseudoSMR as a feature extraction method successfully represents the amino acid sequence for the Rotation Forest classifier with PCA or with IPCA. This can be seen from the comparison of the results of evaluation metrics, which were >73% across the six different parameters. The accuracy of both methods was >74%. The results for the other model performance criteria, such as sensitivity, specificity, precision, and F1-score, were all >73%. The data used in this study can be accessed using the following link: https://www.dsc.ui.ac.id/research/amino-acid-pred/. CONCLUSIONS: Both global encoding and PseudoSMR can successfully represent the sequences of amino acids. Rotation Forest (PCA) performed better than Rotation Forest (IPCA) in terms of predicting protein-protein interactions between HIV-1 and human proteins. Both the Rotation Forest (PCA) classifier and the Rotation Forest IPCA classifier performed better than other classifiers, such as Gradient Boosting, K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine (SVM). Rotation Forest (PCA) and Rotation Forest (IPCA) have accuracy, sensitivity, specificity, precision, and F1-score values >70% while the other classifiers have values <70%. |
format | Online Article Text |
id | pubmed-6929266 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69292662019-12-30 Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences Bustamam, Alhadi Musti, Mohamad I. S. Hartomo, Susilo Aprilia, Shirley Tampubolon, Patuan P. Lestari, Dian BMC Genomics Research BACKGROUND: There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effective feature extraction methods can lead to improved model performance. In this study, we used two types of feature extraction methods—global encoding and pseudo-substitution matrix representation (PseudoSMR)—to represent the sequences of amino acids in human proteins and Human Immunodeficiency Virus type 1 (HIV-1) to address the classification problem of predicting protein-protein interactions. We also compared principal component analysis (PCA) with independent principal component analysis (IPCA) as methods for transforming Rotation Forest. RESULTS: The results show that using global encoding and PseudoSMR as a feature extraction method successfully represents the amino acid sequence for the Rotation Forest classifier with PCA or with IPCA. This can be seen from the comparison of the results of evaluation metrics, which were >73% across the six different parameters. The accuracy of both methods was >74%. The results for the other model performance criteria, such as sensitivity, specificity, precision, and F1-score, were all >73%. The data used in this study can be accessed using the following link: https://www.dsc.ui.ac.id/research/amino-acid-pred/. CONCLUSIONS: Both global encoding and PseudoSMR can successfully represent the sequences of amino acids. Rotation Forest (PCA) performed better than Rotation Forest (IPCA) in terms of predicting protein-protein interactions between HIV-1 and human proteins. Both the Rotation Forest (PCA) classifier and the Rotation Forest IPCA classifier performed better than other classifiers, such as Gradient Boosting, K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine (SVM). Rotation Forest (PCA) and Rotation Forest (IPCA) have accuracy, sensitivity, specificity, precision, and F1-score values >70% while the other classifiers have values <70%. BioMed Central 2019-12-24 /pmc/articles/PMC6929266/ /pubmed/31874636 http://dx.doi.org/10.1186/s12864-019-6304-y Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Bustamam, Alhadi Musti, Mohamad I. S. Hartomo, Susilo Aprilia, Shirley Tampubolon, Patuan P. Lestari, Dian Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
title | Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
title_full | Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
title_fullStr | Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
title_full_unstemmed | Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
title_short | Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
title_sort | performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929266/ https://www.ncbi.nlm.nih.gov/pubmed/31874636 http://dx.doi.org/10.1186/s12864-019-6304-y |
work_keys_str_mv | AT bustamamalhadi performanceofrotationforestensembleclassifierandfeatureextractorinpredictingproteininteractionsusingaminoacidsequences AT mustimohamadis performanceofrotationforestensembleclassifierandfeatureextractorinpredictingproteininteractionsusingaminoacidsequences AT hartomosusilo performanceofrotationforestensembleclassifierandfeatureextractorinpredictingproteininteractionsusingaminoacidsequences AT apriliashirley performanceofrotationforestensembleclassifierandfeatureextractorinpredictingproteininteractionsusingaminoacidsequences AT tampubolonpatuanp performanceofrotationforestensembleclassifierandfeatureextractorinpredictingproteininteractionsusingaminoacidsequences AT lestaridian performanceofrotationforestensembleclassifierandfeatureextractorinpredictingproteininteractionsusingaminoacidsequences |