Cargando…

A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering

In the field of computational bioinformatics, identifying a set of genes which are responsible for a particular cellular mechanism, is very much essential for tasks such as medical diagnosis or disease gene identification. Accurately grouping (clustering) the genes is one of the important tasks in u...

Descripción completa

Detalles Bibliográficos
Autores principales: Dutta, Pratik, Saha, Sriparna, Pai, Sanket, Kumar, Aviral
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6971242/
https://www.ncbi.nlm.nih.gov/pubmed/31959782
http://dx.doi.org/10.1038/s41598-020-57437-5
_version_ 1783489683042336768
author Dutta, Pratik
Saha, Sriparna
Pai, Sanket
Kumar, Aviral
author_facet Dutta, Pratik
Saha, Sriparna
Pai, Sanket
Kumar, Aviral
author_sort Dutta, Pratik
collection PubMed
description In the field of computational bioinformatics, identifying a set of genes which are responsible for a particular cellular mechanism, is very much essential for tasks such as medical diagnosis or disease gene identification. Accurately grouping (clustering) the genes is one of the important tasks in understanding the functionalities of the disease genes. In this regard, ensemble clustering becomes a promising approach to combine different clustering solutions to generate almost accurate gene partitioning. Recently, researchers have used generative model as a smart ensemble method to produce the right consensus solution. In the current paper, we develop a protein-protein interaction-based generative model that can efficiently perform a gene clustering. Utilizing protein interaction information as the generative model’s latent variable enables enhance the generative model’s efficiency in inferring final probabilistic labels. The proposed generative model utilizes different weak supervision sources rather utilizing any ground truth information. For weak supervision sources, we use a multi-objective optimization based clustering technique together with the world’s largest gene ontology based knowledge-base named Gene Ontology Consortium(GOC). These weakly supervised labels are supplied to a generative model that eventually assigns all genes to probabilistic labels. The comparative study with respect to silhouette score, Biological Homogeneity Index (BHI) and Biological Stability Index (BSI) proves that the proposed generative model outperforms than other state-of-the-art techniques.
format Online
Article
Text
id pubmed-6971242
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-69712422020-01-27 A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering Dutta, Pratik Saha, Sriparna Pai, Sanket Kumar, Aviral Sci Rep Article In the field of computational bioinformatics, identifying a set of genes which are responsible for a particular cellular mechanism, is very much essential for tasks such as medical diagnosis or disease gene identification. Accurately grouping (clustering) the genes is one of the important tasks in understanding the functionalities of the disease genes. In this regard, ensemble clustering becomes a promising approach to combine different clustering solutions to generate almost accurate gene partitioning. Recently, researchers have used generative model as a smart ensemble method to produce the right consensus solution. In the current paper, we develop a protein-protein interaction-based generative model that can efficiently perform a gene clustering. Utilizing protein interaction information as the generative model’s latent variable enables enhance the generative model’s efficiency in inferring final probabilistic labels. The proposed generative model utilizes different weak supervision sources rather utilizing any ground truth information. For weak supervision sources, we use a multi-objective optimization based clustering technique together with the world’s largest gene ontology based knowledge-base named Gene Ontology Consortium(GOC). These weakly supervised labels are supplied to a generative model that eventually assigns all genes to probabilistic labels. The comparative study with respect to silhouette score, Biological Homogeneity Index (BHI) and Biological Stability Index (BSI) proves that the proposed generative model outperforms than other state-of-the-art techniques. Nature Publishing Group UK 2020-01-20 /pmc/articles/PMC6971242/ /pubmed/31959782 http://dx.doi.org/10.1038/s41598-020-57437-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Dutta, Pratik
Saha, Sriparna
Pai, Sanket
Kumar, Aviral
A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering
title A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering
title_full A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering
title_fullStr A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering
title_full_unstemmed A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering
title_short A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering
title_sort protein interaction information-based generative model for enhancing gene clustering
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6971242/
https://www.ncbi.nlm.nih.gov/pubmed/31959782
http://dx.doi.org/10.1038/s41598-020-57437-5
work_keys_str_mv AT duttapratik aproteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT sahasriparna aproteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT paisanket aproteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT kumaraviral aproteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT duttapratik proteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT sahasriparna proteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT paisanket proteininteractioninformationbasedgenerativemodelforenhancinggeneclustering
AT kumaraviral proteininteractioninformationbasedgenerativemodelforenhancinggeneclustering