Cargando…
Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a la...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8660898/ https://www.ncbi.nlm.nih.gov/pubmed/34887492 http://dx.doi.org/10.1038/s41598-021-03230-x |
_version_ | 1784613290667671552 |
---|---|
author | Arani, Asieh Amousoltani Sehhati, Mohammadreza Tabatabaiefar, Mohammad Amin |
author_facet | Arani, Asieh Amousoltani Sehhati, Mohammadreza Tabatabaiefar, Mohammad Amin |
author_sort | Arani, Asieh Amousoltani |
collection | PubMed |
description | Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity. |
format | Online Article Text |
id | pubmed-8660898 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-86608982021-12-13 Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization Arani, Asieh Amousoltani Sehhati, Mohammadreza Tabatabaiefar, Mohammad Amin Sci Rep Article Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity. Nature Publishing Group UK 2021-12-09 /pmc/articles/PMC8660898/ /pubmed/34887492 http://dx.doi.org/10.1038/s41598-021-03230-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Arani, Asieh Amousoltani Sehhati, Mohammadreza Tabatabaiefar, Mohammad Amin Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
title | Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
title_full | Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
title_fullStr | Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
title_full_unstemmed | Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
title_short | Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
title_sort | predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8660898/ https://www.ncbi.nlm.nih.gov/pubmed/34887492 http://dx.doi.org/10.1038/s41598-021-03230-x |
work_keys_str_mv | AT araniasiehamousoltani predictingdeleteriousmissensegeneticvariantsviaintegrativesupervisednonnegativematrixtrifactorization AT sehhatimohammadreza predictingdeleteriousmissensegeneticvariantsviaintegrativesupervisednonnegativematrixtrifactorization AT tabatabaiefarmohammadamin predictingdeleteriousmissensegeneticvariantsviaintegrativesupervisednonnegativematrixtrifactorization |