Cargando…
A Simple Guideline to Assess the Characteristics of RNA-Seq Data
Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data syst...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6241233/ https://www.ncbi.nlm.nih.gov/pubmed/30519573 http://dx.doi.org/10.1155/2018/2906292 |
_version_ | 1783371755625119744 |
---|---|
author | Son, Keunhong Yu, Sungryul Shin, Wonseok Han, Kyudong Kang, Keunsoo |
author_facet | Son, Keunhong Yu, Sungryul Shin, Wonseok Han, Kyudong Kang, Keunsoo |
author_sort | Son, Keunhong |
collection | PubMed |
description | Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis. |
format | Online Article Text |
id | pubmed-6241233 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-62412332018-12-05 A Simple Guideline to Assess the Characteristics of RNA-Seq Data Son, Keunhong Yu, Sungryul Shin, Wonseok Han, Kyudong Kang, Keunsoo Biomed Res Int Research Article Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis. Hindawi 2018-11-04 /pmc/articles/PMC6241233/ /pubmed/30519573 http://dx.doi.org/10.1155/2018/2906292 Text en Copyright © 2018 Keunhong Son et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Son, Keunhong Yu, Sungryul Shin, Wonseok Han, Kyudong Kang, Keunsoo A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title | A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_full | A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_fullStr | A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_full_unstemmed | A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_short | A Simple Guideline to Assess the Characteristics of RNA-Seq Data |
title_sort | simple guideline to assess the characteristics of rna-seq data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6241233/ https://www.ncbi.nlm.nih.gov/pubmed/30519573 http://dx.doi.org/10.1155/2018/2906292 |
work_keys_str_mv | AT sonkeunhong asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT yusungryul asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT shinwonseok asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT hankyudong asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT kangkeunsoo asimpleguidelinetoassessthecharacteristicsofrnaseqdata AT sonkeunhong simpleguidelinetoassessthecharacteristicsofrnaseqdata AT yusungryul simpleguidelinetoassessthecharacteristicsofrnaseqdata AT shinwonseok simpleguidelinetoassessthecharacteristicsofrnaseqdata AT hankyudong simpleguidelinetoassessthecharacteristicsofrnaseqdata AT kangkeunsoo simpleguidelinetoassessthecharacteristicsofrnaseqdata |