Cargando…
An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716222/ https://www.ncbi.nlm.nih.gov/pubmed/34976114 http://dx.doi.org/10.1155/2021/9436582 |
_version_ | 1784624277168848896 |
---|---|
author | Sun, Hongwei Wang, Jiu Zhang, Zhongwen Hu, Naibao Wang, Tong |
author_facet | Sun, Hongwei Wang, Jiu Zhang, Zhongwen Hu, Naibao Wang, Tong |
author_sort | Sun, Hongwei |
collection | PubMed |
description | High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters. |
format | Online Article Text |
id | pubmed-8716222 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-87162222021-12-30 An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data Sun, Hongwei Wang, Jiu Zhang, Zhongwen Hu, Naibao Wang, Tong Comput Math Methods Med Research Article High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters. Hindawi 2021-12-22 /pmc/articles/PMC8716222/ /pubmed/34976114 http://dx.doi.org/10.1155/2021/9436582 Text en Copyright © 2021 Hongwei Sun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Sun, Hongwei Wang, Jiu Zhang, Zhongwen Hu, Naibao Wang, Tong An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data |
title | An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data |
title_full | An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data |
title_fullStr | An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data |
title_full_unstemmed | An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data |
title_short | An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data |
title_sort | efficient algorithm for the detection of outliers in mislabeled omics data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716222/ https://www.ncbi.nlm.nih.gov/pubmed/34976114 http://dx.doi.org/10.1155/2021/9436582 |
work_keys_str_mv | AT sunhongwei anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT wangjiu anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT zhangzhongwen anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT hunaibao anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT wangtong anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT sunhongwei efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT wangjiu efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT zhangzhongwen efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT hunaibao efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata AT wangtong efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata |