Cargando…

Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences

SIMPLE SUMMARY: Antioxidant compounds protect the human body from many kinds of diseases as well as the degeneration of age. Several micronutrients that were found in the last century such as vitamins A, C, and E have become popular in our life. Scientists are trying to find more and more antioxidan...

Descripción completa

Detalles Bibliográficos
Autores principales: Ho Thanh Lam, Luu, Le, Ngoc Hoang, Van Tuan, Le, Tran Ban, Ho, Nguyen Khanh Hung, Truong, Nguyen, Ngan Thi Kim, Huu Dang, Luong, Le, Nguyen Quoc Khanh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7599600/
https://www.ncbi.nlm.nih.gov/pubmed/33036150
http://dx.doi.org/10.3390/biology9100325
_version_ 1783602916003676160
author Ho Thanh Lam, Luu
Le, Ngoc Hoang
Van Tuan, Le
Tran Ban, Ho
Nguyen Khanh Hung, Truong
Nguyen, Ngan Thi Kim
Huu Dang, Luong
Le, Nguyen Quoc Khanh
author_facet Ho Thanh Lam, Luu
Le, Ngoc Hoang
Van Tuan, Le
Tran Ban, Ho
Nguyen Khanh Hung, Truong
Nguyen, Ngan Thi Kim
Huu Dang, Luong
Le, Nguyen Quoc Khanh
author_sort Ho Thanh Lam, Luu
collection PubMed
description SIMPLE SUMMARY: Antioxidant compounds protect the human body from many kinds of diseases as well as the degeneration of age. Several micronutrients that were found in the last century such as vitamins A, C, and E have become popular in our life. Scientists are trying to find more and more antioxidant compounds not only from experimenting in the laboratory but also from assisting by the computer. Our research utilized a computational method for the swift and economic identification of antioxidant compounds. The research presents a predictor that got a high accuracy of 84.6% for the detection of antioxidants. Therefore, our predictor is promising to be a useful tool to discover a new antioxidant compound. ABSTRACT: Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
format Online
Article
Text
id pubmed-7599600
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75996002020-11-01 Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences Ho Thanh Lam, Luu Le, Ngoc Hoang Van Tuan, Le Tran Ban, Ho Nguyen Khanh Hung, Truong Nguyen, Ngan Thi Kim Huu Dang, Luong Le, Nguyen Quoc Khanh Biology (Basel) Article SIMPLE SUMMARY: Antioxidant compounds protect the human body from many kinds of diseases as well as the degeneration of age. Several micronutrients that were found in the last century such as vitamins A, C, and E have become popular in our life. Scientists are trying to find more and more antioxidant compounds not only from experimenting in the laboratory but also from assisting by the computer. Our research utilized a computational method for the swift and economic identification of antioxidant compounds. The research presents a predictor that got a high accuracy of 84.6% for the detection of antioxidants. Therefore, our predictor is promising to be a useful tool to discover a new antioxidant compound. ABSTRACT: Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification. MDPI 2020-10-06 /pmc/articles/PMC7599600/ /pubmed/33036150 http://dx.doi.org/10.3390/biology9100325 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ho Thanh Lam, Luu
Le, Ngoc Hoang
Van Tuan, Le
Tran Ban, Ho
Nguyen Khanh Hung, Truong
Nguyen, Ngan Thi Kim
Huu Dang, Luong
Le, Nguyen Quoc Khanh
Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_full Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_fullStr Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_full_unstemmed Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_short Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences
title_sort machine learning model for identifying antioxidant proteins using features calculated from primary sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7599600/
https://www.ncbi.nlm.nih.gov/pubmed/33036150
http://dx.doi.org/10.3390/biology9100325
work_keys_str_mv AT hothanhlamluu machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT lengochoang machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT vantuanle machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT tranbanho machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT nguyenkhanhhungtruong machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT nguyennganthikim machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT huudangluong machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences
AT lenguyenquockhanh machinelearningmodelforidentifyingantioxidantproteinsusingfeaturescalculatedfromprimarysequences