Cargando…
An embedded method for gene identification problems involving unwanted data heterogeneity
BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing stati...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805328/ https://www.ncbi.nlm.nih.gov/pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0 |
_version_ | 1783461356709609472 |
---|---|
author | Lu, Meng |
author_facet | Lu, Meng |
author_sort | Lu, Meng |
collection | PubMed |
description | BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. RESULTS: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. CONCLUSIONS: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression. |
format | Online Article Text |
id | pubmed-6805328 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68053282019-10-24 An embedded method for gene identification problems involving unwanted data heterogeneity Lu, Meng Hum Genomics Research BACKGROUND: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy. RESULTS: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types. CONCLUSIONS: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression. BioMed Central 2019-10-22 /pmc/articles/PMC6805328/ /pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Lu, Meng An embedded method for gene identification problems involving unwanted data heterogeneity |
title | An embedded method for gene identification problems involving unwanted data heterogeneity |
title_full | An embedded method for gene identification problems involving unwanted data heterogeneity |
title_fullStr | An embedded method for gene identification problems involving unwanted data heterogeneity |
title_full_unstemmed | An embedded method for gene identification problems involving unwanted data heterogeneity |
title_short | An embedded method for gene identification problems involving unwanted data heterogeneity |
title_sort | embedded method for gene identification problems involving unwanted data heterogeneity |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805328/ https://www.ncbi.nlm.nih.gov/pubmed/31639059 http://dx.doi.org/10.1186/s40246-019-0228-0 |
work_keys_str_mv | AT lumeng anembeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity AT lumeng embeddedmethodforgeneidentificationproblemsinvolvingunwanteddataheterogeneity |