Cargando…

An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data

The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of...

Descripción completa

Detalles Bibliográficos
Autores principales: George, Nysia I., Bowyer, John F., Crabtree, Nathaniel M., Chang, Ching-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4454687/
https://www.ncbi.nlm.nih.gov/pubmed/26039068
http://dx.doi.org/10.1371/journal.pone.0125224
_version_ 1782374636491636736
author George, Nysia I.
Bowyer, John F.
Crabtree, Nathaniel M.
Chang, Ching-Wei
author_facet George, Nysia I.
Bowyer, John F.
Crabtree, Nathaniel M.
Chang, Ching-Wei
author_sort George, Nysia I.
collection PubMed
description The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data.
format Online
Article
Text
id pubmed-4454687
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44546872015-06-09 An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data George, Nysia I. Bowyer, John F. Crabtree, Nathaniel M. Chang, Ching-Wei PLoS One Research Article The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data. Public Library of Science 2015-06-03 /pmc/articles/PMC4454687/ /pubmed/26039068 http://dx.doi.org/10.1371/journal.pone.0125224 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
George, Nysia I.
Bowyer, John F.
Crabtree, Nathaniel M.
Chang, Ching-Wei
An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
title An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
title_full An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
title_fullStr An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
title_full_unstemmed An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
title_short An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
title_sort iterative leave-one-out approach to outlier detection in rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4454687/
https://www.ncbi.nlm.nih.gov/pubmed/26039068
http://dx.doi.org/10.1371/journal.pone.0125224
work_keys_str_mv AT georgenysiai aniterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT bowyerjohnf aniterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT crabtreenathanielm aniterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT changchingwei aniterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT georgenysiai iterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT bowyerjohnf iterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT crabtreenathanielm iterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata
AT changchingwei iterativeleaveoneoutapproachtooutlierdetectioninrnaseqdata