Cargando…

An embedded method for gene identification problems involving unwanted data heterogeneity

BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing stati...

Descripción completa

Detalles Bibliográficos
Autor principal:	Lu, Meng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805328/ https://www.ncbi.nlm.nih.gov/pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0

_version_	1783461356709609472
author	Lu, Meng
author_facet	Lu, Meng
author_sort	Lu, Meng
collection	PubMed
description	BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. RESULTS: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. CONCLUSIONS: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression.
format	Online Article Text
id	pubmed-6805328
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-68053282019-10-24 An embedded method for gene identification problems involving unwanted data heterogeneity Lu, Meng Hum Genomics Research BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. RESULTS: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. CONCLUSIONS: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression. BioMed Central 2019-10-22 /pmc/articles/PMC6805328/ /pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Lu, Meng An embedded method for gene identification problems involving unwanted data heterogeneity
title	An embedded method for gene identification problems involving unwanted data heterogeneity
title_full	An embedded method for gene identification problems involving unwanted data heterogeneity
title_fullStr	An embedded method for gene identification problems involving unwanted data heterogeneity
title_full_unstemmed	An embedded method for gene identification problems involving unwanted data heterogeneity
title_short	An embedded method for gene identification problems involving unwanted data heterogeneity
title_sort	embedded method for gene identification problems involving unwanted data heterogeneity
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805328/ https://www.ncbi.nlm.nih.gov/pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0
work_keys_str_mv	AT lumeng anembeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity AT lumeng embeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity

An embedded method for gene identification problems involving unwanted data heterogeneity

Ejemplares similares