Cargando…
Faster and more accurate pathogenic combination predictions with VarCoPP2.0
BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can ai...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152795/ https://www.ncbi.nlm.nih.gov/pubmed/37127601 http://dx.doi.org/10.1186/s12859-023-05291-3 |
_version_ | 1785035809160691712 |
---|---|
author | Versbraegen, Nassim Gravel, Barbara Nachtegael, Charlotte Renaux, Alexandre Verkinderen, Emma Nowé, Ann Lenaerts, Tom Papadimitriou, Sofia |
author_facet | Versbraegen, Nassim Gravel, Barbara Nachtegael, Charlotte Renaux, Alexandre Verkinderen, Emma Nowé, Ann Lenaerts, Tom Papadimitriou, Sofia |
author_sort | Versbraegen, Nassim |
collection | PubMed |
description | BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS: We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database (https://olida.ibsquare.be). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS: Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform (https://orval.ibsquare.be) to apply VarCoPP2.0 on their data. |
format | Online Article Text |
id | pubmed-10152795 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-101527952023-05-03 Faster and more accurate pathogenic combination predictions with VarCoPP2.0 Versbraegen, Nassim Gravel, Barbara Nachtegael, Charlotte Renaux, Alexandre Verkinderen, Emma Nowé, Ann Lenaerts, Tom Papadimitriou, Sofia BMC Bioinformatics Software BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS: We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database (https://olida.ibsquare.be). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS: Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform (https://orval.ibsquare.be) to apply VarCoPP2.0 on their data. BioMed Central 2023-05-01 /pmc/articles/PMC10152795/ /pubmed/37127601 http://dx.doi.org/10.1186/s12859-023-05291-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Versbraegen, Nassim Gravel, Barbara Nachtegael, Charlotte Renaux, Alexandre Verkinderen, Emma Nowé, Ann Lenaerts, Tom Papadimitriou, Sofia Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_full | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_fullStr | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_full_unstemmed | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_short | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_sort | faster and more accurate pathogenic combination predictions with varcopp2.0 |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152795/ https://www.ncbi.nlm.nih.gov/pubmed/37127601 http://dx.doi.org/10.1186/s12859-023-05291-3 |
work_keys_str_mv | AT versbraegennassim fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT gravelbarbara fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT nachtegaelcharlotte fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT renauxalexandre fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT verkinderenemma fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT noweann fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT lenaertstom fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT papadimitriousofia fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 |