Cargando…

Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data

With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Haque, Md Nazmul, Sharmin, Sadia, Ali, Amin Ahsan, Sajib, Abu Ashfaqur, Shoyaib, Mohammad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494339/ https://www.ncbi.nlm.nih.gov/pubmed/34613963 http://dx.doi.org/10.1371/journal.pone.0230164

_version_	1784579291237842944
author	Haque, Md Nazmul Sharmin, Sadia Ali, Amin Ahsan Sajib, Abu Ashfaqur Shoyaib, Mohammad
author_facet	Haque, Md Nazmul Sharmin, Sadia Ali, Amin Ahsan Sajib, Abu Ashfaqur Shoyaib, Mohammad
author_sort	Haque, Md Nazmul
collection	PubMed
description	With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGS(f)—based on frequency and MGS(rf)—based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data.
format	Online Article Text
id	pubmed-8494339
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-84943392021-10-07 Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data Haque, Md Nazmul Sharmin, Sadia Ali, Amin Ahsan Sajib, Abu Ashfaqur Shoyaib, Mohammad PLoS One Research Article With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGS(f)—based on frequency and MGS(rf)—based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data. Public Library of Science 2021-10-06 /pmc/articles/PMC8494339/ /pubmed/34613963 http://dx.doi.org/10.1371/journal.pone.0230164 Text en © 2021 Haque et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Haque, Md Nazmul Sharmin, Sadia Ali, Amin Ahsan Sajib, Abu Ashfaqur Shoyaib, Mohammad Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
title	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
title_full	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
title_fullStr	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
title_full_unstemmed	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
title_short	Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
title_sort	use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494339/ https://www.ncbi.nlm.nih.gov/pubmed/34613963 http://dx.doi.org/10.1371/journal.pone.0230164
work_keys_str_mv	AT haquemdnazmul useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT sharminsadia useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT aliaminahsan useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT sajibabuashfaqur useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata AT shoyaibmohammad useofrelevancyandcomplementaryinformationfordiscriminatorygeneselectionfromhighdimensionalgeneexpressiondata

Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data

Ejemplares similares