Cargando…

Modeling the next generation sequencing sample processing pipeline for the purposes of classification

BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitat...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghaffari, Noushin, Yousefi, Mohammadmahdi R, Johnson, Charles D, Ivanov, Ivan, Dougherty, Edward R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3819514/
https://www.ncbi.nlm.nih.gov/pubmed/24118904
http://dx.doi.org/10.1186/1471-2105-14-307
_version_ 1782289998821720064
author Ghaffari, Noushin
Yousefi, Mohammadmahdi R
Johnson, Charles D
Ivanov, Ivan
Dougherty, Edward R
author_facet Ghaffari, Noushin
Yousefi, Mohammadmahdi R
Johnson, Charles D
Ivanov, Ivan
Dougherty, Edward R
author_sort Ghaffari, Noushin
collection PubMed
description BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitate classifier design by providing expression measurements for tens of thousands of genes simultaneously via the abundance of their mRNA transcripts. Because NGS technologies result in a nonlinear transformation of the actual expression distributions, their application can result in data that are less discriminative than would be the actual expression levels themselves, were they directly observable. RESULTS: Using state-of-the-art distributional modeling for the NGS processing pipeline, this paper studies how that pipeline, via the resulting nonlinear transformation, affects classification and feature selection. The effects of different factors are considered and NGS-based classification is compared to SAGE-based classification and classification directly on the raw expression data, which is represented by a very high-dimensional model previously developed for gene expression. As expected, the nonlinear transformation resulting from NGS processing diminishes classification accuracy; however, owing to a larger number of reads, NGS-based classification outperforms SAGE-based classification. CONCLUSIONS: Having high numbers of reads can mitigate the degradation in classification performance resulting from the effects of NGS technologies. Hence, when performing a RNA-Seq analysis, using the highest possible coverage of the genome is recommended for the purposes of classification.
format Online
Article
Text
id pubmed-3819514
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38195142013-11-11 Modeling the next generation sequencing sample processing pipeline for the purposes of classification Ghaffari, Noushin Yousefi, Mohammadmahdi R Johnson, Charles D Ivanov, Ivan Dougherty, Edward R BMC Bioinformatics Research Article BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitate classifier design by providing expression measurements for tens of thousands of genes simultaneously via the abundance of their mRNA transcripts. Because NGS technologies result in a nonlinear transformation of the actual expression distributions, their application can result in data that are less discriminative than would be the actual expression levels themselves, were they directly observable. RESULTS: Using state-of-the-art distributional modeling for the NGS processing pipeline, this paper studies how that pipeline, via the resulting nonlinear transformation, affects classification and feature selection. The effects of different factors are considered and NGS-based classification is compared to SAGE-based classification and classification directly on the raw expression data, which is represented by a very high-dimensional model previously developed for gene expression. As expected, the nonlinear transformation resulting from NGS processing diminishes classification accuracy; however, owing to a larger number of reads, NGS-based classification outperforms SAGE-based classification. CONCLUSIONS: Having high numbers of reads can mitigate the degradation in classification performance resulting from the effects of NGS technologies. Hence, when performing a RNA-Seq analysis, using the highest possible coverage of the genome is recommended for the purposes of classification. BioMed Central 2013-10-11 /pmc/articles/PMC3819514/ /pubmed/24118904 http://dx.doi.org/10.1186/1471-2105-14-307 Text en Copyright © 2013 Ghaffari et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ghaffari, Noushin
Yousefi, Mohammadmahdi R
Johnson, Charles D
Ivanov, Ivan
Dougherty, Edward R
Modeling the next generation sequencing sample processing pipeline for the purposes of classification
title Modeling the next generation sequencing sample processing pipeline for the purposes of classification
title_full Modeling the next generation sequencing sample processing pipeline for the purposes of classification
title_fullStr Modeling the next generation sequencing sample processing pipeline for the purposes of classification
title_full_unstemmed Modeling the next generation sequencing sample processing pipeline for the purposes of classification
title_short Modeling the next generation sequencing sample processing pipeline for the purposes of classification
title_sort modeling the next generation sequencing sample processing pipeline for the purposes of classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3819514/
https://www.ncbi.nlm.nih.gov/pubmed/24118904
http://dx.doi.org/10.1186/1471-2105-14-307
work_keys_str_mv AT ghaffarinoushin modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification
AT yousefimohammadmahdir modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification
AT johnsoncharlesd modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification
AT ivanovivan modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification
AT doughertyedwardr modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification