Cargando…

HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants

Current human Single Amino acid Variants (SAVs) databases provide a link between a SAVs and their effect on the carrier individual phenotype, often dividing them into Deleterious/Neutral variants. This is a very coarse-grained description of the genotype-to-phenotype relationship because it relies o...

Descripción completa

Detalles Bibliográficos
Autores principales: Raimondi, Daniele, Codicè, Francesco, Orlando, Gabriele, Schymkowitz, Joost, Rousseau, Frederic, Moreau, Yves
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9166469/
https://www.ncbi.nlm.nih.gov/pubmed/35669450
http://dx.doi.org/10.1016/j.crstbi.2022.04.004
_version_ 1784720610501328896
author Raimondi, Daniele
Codicè, Francesco
Orlando, Gabriele
Schymkowitz, Joost
Rousseau, Frederic
Moreau, Yves
author_facet Raimondi, Daniele
Codicè, Francesco
Orlando, Gabriele
Schymkowitz, Joost
Rousseau, Frederic
Moreau, Yves
author_sort Raimondi, Daniele
collection PubMed
description Current human Single Amino acid Variants (SAVs) databases provide a link between a SAVs and their effect on the carrier individual phenotype, often dividing them into Deleterious/Neutral variants. This is a very coarse-grained description of the genotype-to-phenotype relationship because it relies on un-realistic assumptions such as the perfect Mendelian behavior of each SAV and considers only dichotomic phenotypes. Moreover, the link between the effect of a SAV on a protein (its molecular phenotype) and the individual phenotype is often very complex, because multiple level of biological abstraction connect the protein and individual level phenotypes. Here we present HPMPdb, a manually curated database containing human SAVs associated with the detailed description of the molecular phenotype they cause on the affected proteins. With particular regards to machine learning (ML), this database can be used to let researchers go beyond the existing Deleterious/Neutral prediction paradigm, allowing them to build molecular phenotype predictors instead. Our class labels describe in a succinct way the effects that each SAV has on 15 protein molecular phenotypes, such as protein-protein interaction, small molecules binding, function, post-translational modifications (PTMs), sub-cellular localization, mimetic PTM, folding and protein expression. Moreover, we provide researchers with all necessary means to re-producibly train and test their models on our database. The webserver and the data described in this paper are available at hpmp.esat.kuleuven.be.
format Online
Article
Text
id pubmed-9166469
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-91664692022-06-05 HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants Raimondi, Daniele Codicè, Francesco Orlando, Gabriele Schymkowitz, Joost Rousseau, Frederic Moreau, Yves Curr Res Struct Biol Research Article Current human Single Amino acid Variants (SAVs) databases provide a link between a SAVs and their effect on the carrier individual phenotype, often dividing them into Deleterious/Neutral variants. This is a very coarse-grained description of the genotype-to-phenotype relationship because it relies on un-realistic assumptions such as the perfect Mendelian behavior of each SAV and considers only dichotomic phenotypes. Moreover, the link between the effect of a SAV on a protein (its molecular phenotype) and the individual phenotype is often very complex, because multiple level of biological abstraction connect the protein and individual level phenotypes. Here we present HPMPdb, a manually curated database containing human SAVs associated with the detailed description of the molecular phenotype they cause on the affected proteins. With particular regards to machine learning (ML), this database can be used to let researchers go beyond the existing Deleterious/Neutral prediction paradigm, allowing them to build molecular phenotype predictors instead. Our class labels describe in a succinct way the effects that each SAV has on 15 protein molecular phenotypes, such as protein-protein interaction, small molecules binding, function, post-translational modifications (PTMs), sub-cellular localization, mimetic PTM, folding and protein expression. Moreover, we provide researchers with all necessary means to re-producibly train and test their models on our database. The webserver and the data described in this paper are available at hpmp.esat.kuleuven.be. Elsevier 2022-05-13 /pmc/articles/PMC9166469/ /pubmed/35669450 http://dx.doi.org/10.1016/j.crstbi.2022.04.004 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Raimondi, Daniele
Codicè, Francesco
Orlando, Gabriele
Schymkowitz, Joost
Rousseau, Frederic
Moreau, Yves
HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants
title HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants
title_full HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants
title_fullStr HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants
title_full_unstemmed HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants
title_short HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants
title_sort hpmpdb: a machine learning-ready database of protein molecular phenotypes associated to human missense variants
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9166469/
https://www.ncbi.nlm.nih.gov/pubmed/35669450
http://dx.doi.org/10.1016/j.crstbi.2022.04.004
work_keys_str_mv AT raimondidaniele hpmpdbamachinelearningreadydatabaseofproteinmolecularphenotypesassociatedtohumanmissensevariants
AT codicefrancesco hpmpdbamachinelearningreadydatabaseofproteinmolecularphenotypesassociatedtohumanmissensevariants
AT orlandogabriele hpmpdbamachinelearningreadydatabaseofproteinmolecularphenotypesassociatedtohumanmissensevariants
AT schymkowitzjoost hpmpdbamachinelearningreadydatabaseofproteinmolecularphenotypesassociatedtohumanmissensevariants
AT rousseaufrederic hpmpdbamachinelearningreadydatabaseofproteinmolecularphenotypesassociatedtohumanmissensevariants
AT moreauyves hpmpdbamachinelearningreadydatabaseofproteinmolecularphenotypesassociatedtohumanmissensevariants