Cargando…
Modeling the next generation sequencing sample processing pipeline for the purposes of classification
BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitat...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3819514/ https://www.ncbi.nlm.nih.gov/pubmed/24118904 http://dx.doi.org/10.1186/1471-2105-14-307 |
_version_ | 1782289998821720064 |
---|---|
author | Ghaffari, Noushin Yousefi, Mohammadmahdi R Johnson, Charles D Ivanov, Ivan Dougherty, Edward R |
author_facet | Ghaffari, Noushin Yousefi, Mohammadmahdi R Johnson, Charles D Ivanov, Ivan Dougherty, Edward R |
author_sort | Ghaffari, Noushin |
collection | PubMed |
description | BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitate classifier design by providing expression measurements for tens of thousands of genes simultaneously via the abundance of their mRNA transcripts. Because NGS technologies result in a nonlinear transformation of the actual expression distributions, their application can result in data that are less discriminative than would be the actual expression levels themselves, were they directly observable. RESULTS: Using state-of-the-art distributional modeling for the NGS processing pipeline, this paper studies how that pipeline, via the resulting nonlinear transformation, affects classification and feature selection. The effects of different factors are considered and NGS-based classification is compared to SAGE-based classification and classification directly on the raw expression data, which is represented by a very high-dimensional model previously developed for gene expression. As expected, the nonlinear transformation resulting from NGS processing diminishes classification accuracy; however, owing to a larger number of reads, NGS-based classification outperforms SAGE-based classification. CONCLUSIONS: Having high numbers of reads can mitigate the degradation in classification performance resulting from the effects of NGS technologies. Hence, when performing a RNA-Seq analysis, using the highest possible coverage of the genome is recommended for the purposes of classification. |
format | Online Article Text |
id | pubmed-3819514 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38195142013-11-11 Modeling the next generation sequencing sample processing pipeline for the purposes of classification Ghaffari, Noushin Yousefi, Mohammadmahdi R Johnson, Charles D Ivanov, Ivan Dougherty, Edward R BMC Bioinformatics Research Article BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitate classifier design by providing expression measurements for tens of thousands of genes simultaneously via the abundance of their mRNA transcripts. Because NGS technologies result in a nonlinear transformation of the actual expression distributions, their application can result in data that are less discriminative than would be the actual expression levels themselves, were they directly observable. RESULTS: Using state-of-the-art distributional modeling for the NGS processing pipeline, this paper studies how that pipeline, via the resulting nonlinear transformation, affects classification and feature selection. The effects of different factors are considered and NGS-based classification is compared to SAGE-based classification and classification directly on the raw expression data, which is represented by a very high-dimensional model previously developed for gene expression. As expected, the nonlinear transformation resulting from NGS processing diminishes classification accuracy; however, owing to a larger number of reads, NGS-based classification outperforms SAGE-based classification. CONCLUSIONS: Having high numbers of reads can mitigate the degradation in classification performance resulting from the effects of NGS technologies. Hence, when performing a RNA-Seq analysis, using the highest possible coverage of the genome is recommended for the purposes of classification. BioMed Central 2013-10-11 /pmc/articles/PMC3819514/ /pubmed/24118904 http://dx.doi.org/10.1186/1471-2105-14-307 Text en Copyright © 2013 Ghaffari et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Ghaffari, Noushin Yousefi, Mohammadmahdi R Johnson, Charles D Ivanov, Ivan Dougherty, Edward R Modeling the next generation sequencing sample processing pipeline for the purposes of classification |
title | Modeling the next generation sequencing sample processing pipeline for the purposes of classification |
title_full | Modeling the next generation sequencing sample processing pipeline for the purposes of classification |
title_fullStr | Modeling the next generation sequencing sample processing pipeline for the purposes of classification |
title_full_unstemmed | Modeling the next generation sequencing sample processing pipeline for the purposes of classification |
title_short | Modeling the next generation sequencing sample processing pipeline for the purposes of classification |
title_sort | modeling the next generation sequencing sample processing pipeline for the purposes of classification |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3819514/ https://www.ncbi.nlm.nih.gov/pubmed/24118904 http://dx.doi.org/10.1186/1471-2105-14-307 |
work_keys_str_mv | AT ghaffarinoushin modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification AT yousefimohammadmahdir modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification AT johnsoncharlesd modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification AT ivanovivan modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification AT doughertyedwardr modelingthenextgenerationsequencingsampleprocessingpipelineforthepurposesofclassification |