Cargando…

A high-throughput SNP discovery strategy for RNA-seq data

BACKGROUND: Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of rel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Yun, Wang, Ke, Wang, Wen-li, Yin, Ting-ting, Dong, Wei-qi, Xu, Chang-jie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391812/ https://www.ncbi.nlm.nih.gov/pubmed/30813897 http://dx.doi.org/10.1186/s12864-019-5533-4

_version_	1783398370734243840
author	Zhao, Yun Wang, Ke Wang, Wen-li Yin, Ting-ting Dong, Wei-qi Xu, Chang-jie
author_facet	Zhao, Yun Wang, Ke Wang, Wen-li Yin, Ting-ting Dong, Wei-qi Xu, Chang-jie
author_sort	Zhao, Yun
collection	PubMed
description	BACKGROUND: Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of reliable SNP discovery methods. Especially, the optimum assembler and SNP caller for accurate SNP prediction from next generation sequencing data are not known. RESULTS: Herein we performed SNP prediction based on RNA-seq data of peach and mandarin peel tissue under a comprehensive comparison of two paired-end read lengths (125 bp and 150 bp), five assemblers (Trinity, IDBA, oases, SOAPdenovo, Trans-abyss) and two SNP callers (GATK and GBS). The predicted SNPs were compared with the authentic SNPs identified via PCR amplification followed by gene cloning and sequencing procedures. A total of 40 and 240 authentic SNPs were presented in five anthocyanin biosynthesis related genes in peach and in nine carotenogenic genes in mandarin. Putative SNPs predicted from the same RNA-seq data with different strategies led to quite divergent results. The rate of false positive SNPs was significantly lower when the paired-end read length was 150 bp compared with 125 bp. Trinity was superior to the other four assemblers and GATK was substantially superior to GBS due to a low rate of missing authentic SNPs. The combination of assembler Trinity, SNP caller GATK, and the paired-end read length 150 bp had the best performance in SNP discovery with 100% accuracy both in peach and in mandarin cases. This strategy was applied to the characterization of SNPs in peach and mandarin transcriptomes. CONCLUSIONS: Through comparison of authentic SNPs obtained by PCR cloning strategy and putative SNPs predicted from different combinations of five assemblers, two SNP callers, and two paired-end read lengths, we provided a reliable and efficient strategy, Trinity-GATK with 150 bp paired-end read length, for SNP discovery from RNA-seq data. This strategy discovered SNP at 100% accuracy in peach and mandarin cases and might be applicable to a wide range of plants and other organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5533-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6391812
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63918122019-03-11 A high-throughput SNP discovery strategy for RNA-seq data Zhao, Yun Wang, Ke Wang, Wen-li Yin, Ting-ting Dong, Wei-qi Xu, Chang-jie BMC Genomics Methodology Article BACKGROUND: Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of reliable SNP discovery methods. Especially, the optimum assembler and SNP caller for accurate SNP prediction from next generation sequencing data are not known. RESULTS: Herein we performed SNP prediction based on RNA-seq data of peach and mandarin peel tissue under a comprehensive comparison of two paired-end read lengths (125 bp and 150 bp), five assemblers (Trinity, IDBA, oases, SOAPdenovo, Trans-abyss) and two SNP callers (GATK and GBS). The predicted SNPs were compared with the authentic SNPs identified via PCR amplification followed by gene cloning and sequencing procedures. A total of 40 and 240 authentic SNPs were presented in five anthocyanin biosynthesis related genes in peach and in nine carotenogenic genes in mandarin. Putative SNPs predicted from the same RNA-seq data with different strategies led to quite divergent results. The rate of false positive SNPs was significantly lower when the paired-end read length was 150 bp compared with 125 bp. Trinity was superior to the other four assemblers and GATK was substantially superior to GBS due to a low rate of missing authentic SNPs. The combination of assembler Trinity, SNP caller GATK, and the paired-end read length 150 bp had the best performance in SNP discovery with 100% accuracy both in peach and in mandarin cases. This strategy was applied to the characterization of SNPs in peach and mandarin transcriptomes. CONCLUSIONS: Through comparison of authentic SNPs obtained by PCR cloning strategy and putative SNPs predicted from different combinations of five assemblers, two SNP callers, and two paired-end read lengths, we provided a reliable and efficient strategy, Trinity-GATK with 150 bp paired-end read length, for SNP discovery from RNA-seq data. This strategy discovered SNP at 100% accuracy in peach and mandarin cases and might be applicable to a wide range of plants and other organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5533-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-27 /pmc/articles/PMC6391812/ /pubmed/30813897 http://dx.doi.org/10.1186/s12864-019-5533-4 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Zhao, Yun Wang, Ke Wang, Wen-li Yin, Ting-ting Dong, Wei-qi Xu, Chang-jie A high-throughput SNP discovery strategy for RNA-seq data
title	A high-throughput SNP discovery strategy for RNA-seq data
title_full	A high-throughput SNP discovery strategy for RNA-seq data
title_fullStr	A high-throughput SNP discovery strategy for RNA-seq data
title_full_unstemmed	A high-throughput SNP discovery strategy for RNA-seq data
title_short	A high-throughput SNP discovery strategy for RNA-seq data
title_sort	high-throughput snp discovery strategy for rna-seq data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391812/ https://www.ncbi.nlm.nih.gov/pubmed/30813897 http://dx.doi.org/10.1186/s12864-019-5533-4
work_keys_str_mv	AT zhaoyun ahighthroughputsnpdiscoverystrategyforrnaseqdata AT wangke ahighthroughputsnpdiscoverystrategyforrnaseqdata AT wangwenli ahighthroughputsnpdiscoverystrategyforrnaseqdata AT yintingting ahighthroughputsnpdiscoverystrategyforrnaseqdata AT dongweiqi ahighthroughputsnpdiscoverystrategyforrnaseqdata AT xuchangjie ahighthroughputsnpdiscoverystrategyforrnaseqdata AT zhaoyun highthroughputsnpdiscoverystrategyforrnaseqdata AT wangke highthroughputsnpdiscoverystrategyforrnaseqdata AT wangwenli highthroughputsnpdiscoverystrategyforrnaseqdata AT yintingting highthroughputsnpdiscoverystrategyforrnaseqdata AT dongweiqi highthroughputsnpdiscoverystrategyforrnaseqdata AT xuchangjie highthroughputsnpdiscoverystrategyforrnaseqdata

A high-throughput SNP discovery strategy for RNA-seq data

Ejemplares similares