Cargando…

AmazonForest: In Silico Metaprediction of Pathogenic Variants

SIMPLE SUMMARY: ClinVar is a valuable platform that stores a large set of relevant genetic associations with complex phenotypes. However, the functional impact of a partial set of such associations remains misinterpreted, due to the presence of variants with uncertain significance or with conflictin...

Descripción completa

Detalles Bibliográficos
Autores principales: Palheta, Helber Gonzales Almeida, Gonçalves, Wanderson Gonçalves, Brito, Leonardo Miranda, dos Santos, Arthur Ribeiro, dos Reis Matsumoto, Marlon, Ribeiro-dos-Santos, Ândrea, de Araújo, Gilderlanio Santana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9024711/
https://www.ncbi.nlm.nih.gov/pubmed/35453737
http://dx.doi.org/10.3390/biology11040538
_version_ 1784690672483172352
author Palheta, Helber Gonzales Almeida
Gonçalves, Wanderson Gonçalves
Brito, Leonardo Miranda
dos Santos, Arthur Ribeiro
dos Reis Matsumoto, Marlon
Ribeiro-dos-Santos, Ândrea
de Araújo, Gilderlanio Santana
author_facet Palheta, Helber Gonzales Almeida
Gonçalves, Wanderson Gonçalves
Brito, Leonardo Miranda
dos Santos, Arthur Ribeiro
dos Reis Matsumoto, Marlon
Ribeiro-dos-Santos, Ândrea
de Araújo, Gilderlanio Santana
author_sort Palheta, Helber Gonzales Almeida
collection PubMed
description SIMPLE SUMMARY: ClinVar is a valuable platform that stores a large set of relevant genetic associations with complex phenotypes. However, the functional impact of a partial set of such associations remains misinterpreted, due to the presence of variants with uncertain significance or with conflicting pathogenicity interpretations. To fill this gap, we present AmazonForest: a metaprediction model based on Random Forest for pathogenicity prediction. AmazonForest was used to reclassify a set of ∼101,000 variants that were predicted as having high pathogenic probability. AmazonForest is available as a web tool with a simple web interface, and also as an R object for pathogenicity predictions. ABSTRACT: ClinVar is a web platform that stores ∼789,000 genetic associations with complex diseases. A partial set of these cataloged genetic associations has challenged clinicians and geneticists, often leading to conflicting interpretations or uncertain clinical impact significance. In this study, we addressed the (re)classification of genetic variants by AmazonForest, which is a random-forest-based pathogenicity metaprediction model that works by combining functional impact data from eight prediction tools. We evaluated the performance of representation learning algorithms such as autoencoders to propose a better strategy. All metaprediction models were trained with ClinVar data, and genetic variants were annotated with eight functional impact predictors cataloged with SnpEff/SnpSift. AmazonForest implements the best random forest model with a one hot data-encoding strategy, which shows an Area Under ROC Curve of ≥0.93. AmazonForest was employed for pathogenicity prediction of a set of ∼101,000 genetic variants of uncertain significance or conflict of interpretation. Our findings revealed ∼24,000 variants with high pathogenic probability ([Formula: see text]). In addition, we show results for Alzheimer’s Disease as a demonstration of its application in clinical interpretation of genetic variants in complex diseases. Lastly, AmazonForest is available as a web tool and R object that can be loaded to perform pathogenicity predictions.
format Online
Article
Text
id pubmed-9024711
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90247112022-04-23 AmazonForest: In Silico Metaprediction of Pathogenic Variants Palheta, Helber Gonzales Almeida Gonçalves, Wanderson Gonçalves Brito, Leonardo Miranda dos Santos, Arthur Ribeiro dos Reis Matsumoto, Marlon Ribeiro-dos-Santos, Ândrea de Araújo, Gilderlanio Santana Biology (Basel) Article SIMPLE SUMMARY: ClinVar is a valuable platform that stores a large set of relevant genetic associations with complex phenotypes. However, the functional impact of a partial set of such associations remains misinterpreted, due to the presence of variants with uncertain significance or with conflicting pathogenicity interpretations. To fill this gap, we present AmazonForest: a metaprediction model based on Random Forest for pathogenicity prediction. AmazonForest was used to reclassify a set of ∼101,000 variants that were predicted as having high pathogenic probability. AmazonForest is available as a web tool with a simple web interface, and also as an R object for pathogenicity predictions. ABSTRACT: ClinVar is a web platform that stores ∼789,000 genetic associations with complex diseases. A partial set of these cataloged genetic associations has challenged clinicians and geneticists, often leading to conflicting interpretations or uncertain clinical impact significance. In this study, we addressed the (re)classification of genetic variants by AmazonForest, which is a random-forest-based pathogenicity metaprediction model that works by combining functional impact data from eight prediction tools. We evaluated the performance of representation learning algorithms such as autoencoders to propose a better strategy. All metaprediction models were trained with ClinVar data, and genetic variants were annotated with eight functional impact predictors cataloged with SnpEff/SnpSift. AmazonForest implements the best random forest model with a one hot data-encoding strategy, which shows an Area Under ROC Curve of ≥0.93. AmazonForest was employed for pathogenicity prediction of a set of ∼101,000 genetic variants of uncertain significance or conflict of interpretation. Our findings revealed ∼24,000 variants with high pathogenic probability ([Formula: see text]). In addition, we show results for Alzheimer’s Disease as a demonstration of its application in clinical interpretation of genetic variants in complex diseases. Lastly, AmazonForest is available as a web tool and R object that can be loaded to perform pathogenicity predictions. MDPI 2022-03-31 /pmc/articles/PMC9024711/ /pubmed/35453737 http://dx.doi.org/10.3390/biology11040538 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Palheta, Helber Gonzales Almeida
Gonçalves, Wanderson Gonçalves
Brito, Leonardo Miranda
dos Santos, Arthur Ribeiro
dos Reis Matsumoto, Marlon
Ribeiro-dos-Santos, Ândrea
de Araújo, Gilderlanio Santana
AmazonForest: In Silico Metaprediction of Pathogenic Variants
title AmazonForest: In Silico Metaprediction of Pathogenic Variants
title_full AmazonForest: In Silico Metaprediction of Pathogenic Variants
title_fullStr AmazonForest: In Silico Metaprediction of Pathogenic Variants
title_full_unstemmed AmazonForest: In Silico Metaprediction of Pathogenic Variants
title_short AmazonForest: In Silico Metaprediction of Pathogenic Variants
title_sort amazonforest: in silico metaprediction of pathogenic variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9024711/
https://www.ncbi.nlm.nih.gov/pubmed/35453737
http://dx.doi.org/10.3390/biology11040538
work_keys_str_mv AT palhetahelbergonzalesalmeida amazonforestinsilicometapredictionofpathogenicvariants
AT goncalveswandersongoncalves amazonforestinsilicometapredictionofpathogenicvariants
AT britoleonardomiranda amazonforestinsilicometapredictionofpathogenicvariants
AT dossantosarthurribeiro amazonforestinsilicometapredictionofpathogenicvariants
AT dosreismatsumotomarlon amazonforestinsilicometapredictionofpathogenicvariants
AT ribeirodossantosandrea amazonforestinsilicometapredictionofpathogenicvariants
AT dearaujogilderlaniosantana amazonforestinsilicometapredictionofpathogenicvariants