Cargando…

Faster and more accurate pathogenic combination predictions with VarCoPP2.0

BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can ai...

Descripción completa

Detalles Bibliográficos
Autores principales: Versbraegen, Nassim, Gravel, Barbara, Nachtegael, Charlotte, Renaux, Alexandre, Verkinderen, Emma, Nowé, Ann, Lenaerts, Tom, Papadimitriou, Sofia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152795/
https://www.ncbi.nlm.nih.gov/pubmed/37127601
http://dx.doi.org/10.1186/s12859-023-05291-3
_version_ 1785035809160691712
author Versbraegen, Nassim
Gravel, Barbara
Nachtegael, Charlotte
Renaux, Alexandre
Verkinderen, Emma
Nowé, Ann
Lenaerts, Tom
Papadimitriou, Sofia
author_facet Versbraegen, Nassim
Gravel, Barbara
Nachtegael, Charlotte
Renaux, Alexandre
Verkinderen, Emma
Nowé, Ann
Lenaerts, Tom
Papadimitriou, Sofia
author_sort Versbraegen, Nassim
collection PubMed
description BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS: We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database (https://olida.ibsquare.be). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS: Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform (https://orval.ibsquare.be) to apply VarCoPP2.0 on their data.
format Online
Article
Text
id pubmed-10152795
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101527952023-05-03 Faster and more accurate pathogenic combination predictions with VarCoPP2.0 Versbraegen, Nassim Gravel, Barbara Nachtegael, Charlotte Renaux, Alexandre Verkinderen, Emma Nowé, Ann Lenaerts, Tom Papadimitriou, Sofia BMC Bioinformatics Software BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS: We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database (https://olida.ibsquare.be). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS: Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform (https://orval.ibsquare.be) to apply VarCoPP2.0 on their data. BioMed Central 2023-05-01 /pmc/articles/PMC10152795/ /pubmed/37127601 http://dx.doi.org/10.1186/s12859-023-05291-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Versbraegen, Nassim
Gravel, Barbara
Nachtegael, Charlotte
Renaux, Alexandre
Verkinderen, Emma
Nowé, Ann
Lenaerts, Tom
Papadimitriou, Sofia
Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_full Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_fullStr Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_full_unstemmed Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_short Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_sort faster and more accurate pathogenic combination predictions with varcopp2.0
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152795/
https://www.ncbi.nlm.nih.gov/pubmed/37127601
http://dx.doi.org/10.1186/s12859-023-05291-3
work_keys_str_mv AT versbraegennassim fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT gravelbarbara fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT nachtegaelcharlotte fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT renauxalexandre fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT verkinderenemma fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT noweann fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT lenaertstom fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT papadimitriousofia fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20