Cargando…

An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data

High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Hongwei, Wang, Jiu, Zhang, Zhongwen, Hu, Naibao, Wang, Tong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716222/
https://www.ncbi.nlm.nih.gov/pubmed/34976114
http://dx.doi.org/10.1155/2021/9436582
_version_ 1784624277168848896
author Sun, Hongwei
Wang, Jiu
Zhang, Zhongwen
Hu, Naibao
Wang, Tong
author_facet Sun, Hongwei
Wang, Jiu
Zhang, Zhongwen
Hu, Naibao
Wang, Tong
author_sort Sun, Hongwei
collection PubMed
description High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters.
format Online
Article
Text
id pubmed-8716222
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-87162222021-12-30 An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data Sun, Hongwei Wang, Jiu Zhang, Zhongwen Hu, Naibao Wang, Tong Comput Math Methods Med Research Article High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters. Hindawi 2021-12-22 /pmc/articles/PMC8716222/ /pubmed/34976114 http://dx.doi.org/10.1155/2021/9436582 Text en Copyright © 2021 Hongwei Sun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Sun, Hongwei
Wang, Jiu
Zhang, Zhongwen
Hu, Naibao
Wang, Tong
An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
title An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
title_full An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
title_fullStr An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
title_full_unstemmed An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
title_short An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
title_sort efficient algorithm for the detection of outliers in mislabeled omics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716222/
https://www.ncbi.nlm.nih.gov/pubmed/34976114
http://dx.doi.org/10.1155/2021/9436582
work_keys_str_mv AT sunhongwei anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT wangjiu anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT zhangzhongwen anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT hunaibao anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT wangtong anefficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT sunhongwei efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT wangjiu efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT zhangzhongwen efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT hunaibao efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata
AT wangtong efficientalgorithmforthedetectionofoutliersinmislabeledomicsdata