Cargando…

Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data

Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling method...

Descripción completa

Detalles Bibliográficos
Autores principales: Quinn, Emma M., Cormican, Paul, Kenny, Elaine M., Hill, Matthew, Anney, Richard, Gill, Michael, Corvin, Aiden P., Morris, Derek W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3608647/
https://www.ncbi.nlm.nih.gov/pubmed/23555596
http://dx.doi.org/10.1371/journal.pone.0058815
_version_ 1782264263476248576
author Quinn, Emma M.
Cormican, Paul
Kenny, Elaine M.
Hill, Matthew
Anney, Richard
Gill, Michael
Corvin, Aiden P.
Morris, Derek W.
author_facet Quinn, Emma M.
Cormican, Paul
Kenny, Elaine M.
Hill, Matthew
Anney, Richard
Gill, Michael
Corvin, Aiden P.
Morris, Derek W.
author_sort Quinn, Emma M.
collection PubMed
description Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling methods following alignment to both genome and transcriptome by applying them to RNA-seq data from a HapMap lymphoblastoid cell line sample and comparing results with sequence variation data from 1000 Genomes. We determined that the best method to achieve high specificity and sensitivity, and greatest number of SNP calls, is to remove duplicate sequence reads after alignment to the genome and to call SNPs using SAMtools. The accuracy of SNP calls is dependent on sequence coverage available. In terms of specificity, 89% of RNA-seq SNPs calls were true variants where coverage is >10X. In terms of sensitivity, at >10X coverage 92% of all expected SNPs in expressed exons could be detected. Overall, the results indicate that RNA-seq SNP data are a very useful by-product of sequence-based transcriptome analysis. If RNA-seq is applied to disease tissue samples and assuming that genes carrying mutations relevant to disease biology are being expressed, a very high proportion of these mutations can be detected.
format Online
Article
Text
id pubmed-3608647
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36086472013-04-03 Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data Quinn, Emma M. Cormican, Paul Kenny, Elaine M. Hill, Matthew Anney, Richard Gill, Michael Corvin, Aiden P. Morris, Derek W. PLoS One Research Article Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling methods following alignment to both genome and transcriptome by applying them to RNA-seq data from a HapMap lymphoblastoid cell line sample and comparing results with sequence variation data from 1000 Genomes. We determined that the best method to achieve high specificity and sensitivity, and greatest number of SNP calls, is to remove duplicate sequence reads after alignment to the genome and to call SNPs using SAMtools. The accuracy of SNP calls is dependent on sequence coverage available. In terms of specificity, 89% of RNA-seq SNPs calls were true variants where coverage is >10X. In terms of sensitivity, at >10X coverage 92% of all expected SNPs in expressed exons could be detected. Overall, the results indicate that RNA-seq SNP data are a very useful by-product of sequence-based transcriptome analysis. If RNA-seq is applied to disease tissue samples and assuming that genes carrying mutations relevant to disease biology are being expressed, a very high proportion of these mutations can be detected. Public Library of Science 2013-03-26 /pmc/articles/PMC3608647/ /pubmed/23555596 http://dx.doi.org/10.1371/journal.pone.0058815 Text en © 2013 Quinn et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Quinn, Emma M.
Cormican, Paul
Kenny, Elaine M.
Hill, Matthew
Anney, Richard
Gill, Michael
Corvin, Aiden P.
Morris, Derek W.
Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data
title Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data
title_full Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data
title_fullStr Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data
title_full_unstemmed Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data
title_short Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data
title_sort development of strategies for snp detection in rna-seq data: application to lymphoblastoid cell lines and evaluation using 1000 genomes data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3608647/
https://www.ncbi.nlm.nih.gov/pubmed/23555596
http://dx.doi.org/10.1371/journal.pone.0058815
work_keys_str_mv AT quinnemmam developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT cormicanpaul developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT kennyelainem developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT hillmatthew developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT anneyrichard developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT gillmichael developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT corvinaidenp developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata
AT morrisderekw developmentofstrategiesforsnpdetectioninrnaseqdataapplicationtolymphoblastoidcelllinesandevaluationusing1000genomesdata