Cargando…
IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of P...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8657696/ https://www.ncbi.nlm.nih.gov/pubmed/34884870 http://dx.doi.org/10.3390/ijms222313066 |
_version_ | 1784612560520085504 |
---|---|
author | Quevedo-Tumailli, Viviana Ortega-Tenezaca, Bernabe González-Díaz, Humberto |
author_facet | Quevedo-Tumailli, Viviana Ortega-Tenezaca, Bernabe González-Díaz, Humberto |
author_sort | Quevedo-Tumailli, Viviana |
collection | PubMed |
description | The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information—Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (c(assayj) = c(aj) and c(dataj) = cd(j)) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (c(aj)) or about the nature and quality of data (c(dj)). These categorical variables include information about 22 parameters of biological activity (c(a0)), 28 target proteins (c(a1)), and 9 organisms of assay (c(a2)), etc. We also created another partition of (c(protj) = c(pj)) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (c(p0)), 10 chromosomes (c(p1)), gene orientation (c(p2)), and 31 protein functions (c(p3)). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Sh(k) (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium. |
format | Online Article Text |
id | pubmed-8657696 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-86576962021-12-10 IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds Quevedo-Tumailli, Viviana Ortega-Tenezaca, Bernabe González-Díaz, Humberto Int J Mol Sci Article The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information—Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (c(assayj) = c(aj) and c(dataj) = cd(j)) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (c(aj)) or about the nature and quality of data (c(dj)). These categorical variables include information about 22 parameters of biological activity (c(a0)), 28 target proteins (c(a1)), and 9 organisms of assay (c(a2)), etc. We also created another partition of (c(protj) = c(pj)) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (c(p0)), 10 chromosomes (c(p1)), gene orientation (c(p2)), and 31 protein functions (c(p3)). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Sh(k) (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium. MDPI 2021-12-02 /pmc/articles/PMC8657696/ /pubmed/34884870 http://dx.doi.org/10.3390/ijms222313066 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Quevedo-Tumailli, Viviana Ortega-Tenezaca, Bernabe González-Díaz, Humberto IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds |
title | IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds |
title_full | IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds |
title_fullStr | IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds |
title_full_unstemmed | IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds |
title_short | IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds |
title_sort | ifptml mapping of drug graphs with protein and chromosome structural networks vs. pre-clinical assay information for discovery of antimalarial compounds |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8657696/ https://www.ncbi.nlm.nih.gov/pubmed/34884870 http://dx.doi.org/10.3390/ijms222313066 |
work_keys_str_mv | AT quevedotumailliviviana ifptmlmappingofdruggraphswithproteinandchromosomestructuralnetworksvspreclinicalassayinformationfordiscoveryofantimalarialcompounds AT ortegatenezacabernabe ifptmlmappingofdruggraphswithproteinandchromosomestructuralnetworksvspreclinicalassayinformationfordiscoveryofantimalarialcompounds AT gonzalezdiazhumberto ifptmlmappingofdruggraphswithproteinandchromosomestructuralnetworksvspreclinicalassayinformationfordiscoveryofantimalarialcompounds |