Cargando…

A Simple Guideline to Assess the Characteristics of RNA-Seq Data

Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data syst...

Descripción completa

Detalles Bibliográficos
Autores principales: Son, Keunhong, Yu, Sungryul, Shin, Wonseok, Han, Kyudong, Kang, Keunsoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6241233/
https://www.ncbi.nlm.nih.gov/pubmed/30519573
http://dx.doi.org/10.1155/2018/2906292
_version_ 1783371755625119744
author Son, Keunhong
Yu, Sungryul
Shin, Wonseok
Han, Kyudong
Kang, Keunsoo
author_facet Son, Keunhong
Yu, Sungryul
Shin, Wonseok
Han, Kyudong
Kang, Keunsoo
author_sort Son, Keunhong
collection PubMed
description Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis.
format Online
Article
Text
id pubmed-6241233
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-62412332018-12-05 A Simple Guideline to Assess the Characteristics of RNA-Seq Data Son, Keunhong Yu, Sungryul Shin, Wonseok Han, Kyudong Kang, Keunsoo Biomed Res Int Research Article Next-generation sequencing (NGS) techniques have been used to generate various molecular maps including genomes, epigenomes, and transcriptomes. Transcriptomes from a given cell population can be profiled via RNA-seq. However, there is no simple way to assess the characteristics of RNA-seq data systematically. In this study, we provide a simple method that can intuitively evaluate RNA-seq data using two different principal component analysis (PCA) plots. The gene expression PCA plot provides insights into the association between samples, while the transcript integrity number (TIN) score plot provides a quality map of given RNA-seq data. With this approach, we found that RNA-seq datasets deposited in public repositories often contain a few low-quality RNA-seq data that can lead to misinterpretations. The effect of sampling errors for differentially expressed gene (DEG) analysis was evaluated with ten RNA-seq data from invasive ductal carcinoma tissues and three RNA-seq data from adjacent normal tissues taken from a Korean breast cancer patient. The evaluation demonstrated that sampling errors, which select samples that do not represent a given population, can lead to different interpretations when conducting the DEG analysis. Therefore, the proposed approach can be used to avoid sampling errors prior to RNA-seq data analysis. Hindawi 2018-11-04 /pmc/articles/PMC6241233/ /pubmed/30519573 http://dx.doi.org/10.1155/2018/2906292 Text en Copyright © 2018 Keunhong Son et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Son, Keunhong
Yu, Sungryul
Shin, Wonseok
Han, Kyudong
Kang, Keunsoo
A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_full A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_fullStr A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_full_unstemmed A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_short A Simple Guideline to Assess the Characteristics of RNA-Seq Data
title_sort simple guideline to assess the characteristics of rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6241233/
https://www.ncbi.nlm.nih.gov/pubmed/30519573
http://dx.doi.org/10.1155/2018/2906292
work_keys_str_mv AT sonkeunhong asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT yusungryul asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT shinwonseok asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT hankyudong asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT kangkeunsoo asimpleguidelinetoassessthecharacteristicsofrnaseqdata
AT sonkeunhong simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT yusungryul simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT shinwonseok simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT hankyudong simpleguidelinetoassessthecharacteristicsofrnaseqdata
AT kangkeunsoo simpleguidelinetoassessthecharacteristicsofrnaseqdata