Cargando…

An embedded method for gene identification problems involving unwanted data heterogeneity

BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing stati...

Descripción completa

Detalles Bibliográficos
Autor principal: Lu, Meng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805328/
https://www.ncbi.nlm.nih.gov/pubmed/31639059
http://dx.doi.org/10.1186/s40246-019-0228-0
_version_ 1783461356709609472
author Lu, Meng
author_facet Lu, Meng
author_sort Lu, Meng
collection PubMed
description BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. RESULTS: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. CONCLUSIONS: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression.
format Online
Article
Text
id pubmed-6805328
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68053282019-10-24 An embedded method for gene identification problems involving unwanted data heterogeneity Lu, Meng Hum Genomics Research BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. RESULTS: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. CONCLUSIONS: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression. BioMed Central 2019-10-22 /pmc/articles/PMC6805328/ /pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lu, Meng
An embedded method for gene identification problems involving unwanted data heterogeneity
title An embedded method for gene identification problems involving unwanted data heterogeneity
title_full An embedded method for gene identification problems involving unwanted data heterogeneity
title_fullStr An embedded method for gene identification problems involving unwanted data heterogeneity
title_full_unstemmed An embedded method for gene identification problems involving unwanted data heterogeneity
title_short An embedded method for gene identification problems involving unwanted data heterogeneity
title_sort embedded method for gene identification problems involving unwanted data heterogeneity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805328/
https://www.ncbi.nlm.nih.gov/pubmed/31639059
http://dx.doi.org/10.1186/s40246-019-0228-0
work_keys_str_mv AT lumeng anembeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity
AT lumeng embeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity