Cargando…

Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization

Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a la...

Descripción completa

Detalles Bibliográficos
Autores principales: Arani, Asieh Amousoltani, Sehhati, Mohammadreza, Tabatabaiefar, Mohammad Amin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8660898/
https://www.ncbi.nlm.nih.gov/pubmed/34887492
http://dx.doi.org/10.1038/s41598-021-03230-x
_version_ 1784613290667671552
author Arani, Asieh Amousoltani
Sehhati, Mohammadreza
Tabatabaiefar, Mohammad Amin
author_facet Arani, Asieh Amousoltani
Sehhati, Mohammadreza
Tabatabaiefar, Mohammad Amin
author_sort Arani, Asieh Amousoltani
collection PubMed
description Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity.
format Online
Article
Text
id pubmed-8660898
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-86608982021-12-13 Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization Arani, Asieh Amousoltani Sehhati, Mohammadreza Tabatabaiefar, Mohammad Amin Sci Rep Article Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity. Nature Publishing Group UK 2021-12-09 /pmc/articles/PMC8660898/ /pubmed/34887492 http://dx.doi.org/10.1038/s41598-021-03230-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Arani, Asieh Amousoltani
Sehhati, Mohammadreza
Tabatabaiefar, Mohammad Amin
Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
title Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
title_full Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
title_fullStr Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
title_full_unstemmed Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
title_short Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
title_sort predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8660898/
https://www.ncbi.nlm.nih.gov/pubmed/34887492
http://dx.doi.org/10.1038/s41598-021-03230-x
work_keys_str_mv AT araniasiehamousoltani predictingdeleteriousmissensegeneticvariantsviaintegrativesupervisednonnegativematrixtrifactorization
AT sehhatimohammadreza predictingdeleteriousmissensegeneticvariantsviaintegrativesupervisednonnegativematrixtrifactorization
AT tabatabaiefarmohammadamin predictingdeleteriousmissensegeneticvariantsviaintegrativesupervisednonnegativematrixtrifactorization