Cargando…

RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK

Recent advances in variant calling made available in the Genome Analysis Toolkit (GATK) enable the use of validated single-nucleotide polymorphisms and indels to improve variant calling. However, large collections of variants for this purpose often are unavailable to research communities. We introdu...

Descripción completa

Detalles Bibliográficos
Autores principales: McCormick, Ryan F., Truong, Sandra K., Mullet, John E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4390580/
https://www.ncbi.nlm.nih.gov/pubmed/25681258
http://dx.doi.org/10.1534/g3.115.017012
_version_ 1782365701882773504
author McCormick, Ryan F.
Truong, Sandra K.
Mullet, John E.
author_facet McCormick, Ryan F.
Truong, Sandra K.
Mullet, John E.
author_sort McCormick, Ryan F.
collection PubMed
description Recent advances in variant calling made available in the Genome Analysis Toolkit (GATK) enable the use of validated single-nucleotide polymorphisms and indels to improve variant calling. However, large collections of variants for this purpose often are unavailable to research communities. We introduce a workflow to generate reliable collections of single-nucleotide polymorphisms and indels by leveraging available genomic resources to inform variant calling using the GATK. The workflow is demonstrated for the crop plant Sorghum bicolor by (i) generating an initial set of variants using reduced representation sequence data from an experimental cross and association panels, (ii) using the initial variants to inform variant calling from whole-genome sequence data of resequenced individuals, and (iii) using variants identified from whole-genome sequence data for recalibration of the reduced representation sequence data. The reliability of variants called with the workflow is verified by comparison with genetically mappable variants from an independent sorghum experimental cross. Comparison with a recent sorghum resequencing study shows that the workflow identifies an additional 1.62 million high-confidence variants from the same sequence data. Finally, the workflow’s performance is validated using Arabidopsis sequence data, yielding variant call sets with 95% sensitivity and 99% positive predictive value. The Recalibration and Interrelation of genomic sequence data with the GATK (RIG) workflow enables the GATK to accurately identify genetic variation in organisms lacking validated variant resources.
format Online
Article
Text
id pubmed-4390580
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-43905802015-04-10 RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK McCormick, Ryan F. Truong, Sandra K. Mullet, John E. G3 (Bethesda) Investigations Recent advances in variant calling made available in the Genome Analysis Toolkit (GATK) enable the use of validated single-nucleotide polymorphisms and indels to improve variant calling. However, large collections of variants for this purpose often are unavailable to research communities. We introduce a workflow to generate reliable collections of single-nucleotide polymorphisms and indels by leveraging available genomic resources to inform variant calling using the GATK. The workflow is demonstrated for the crop plant Sorghum bicolor by (i) generating an initial set of variants using reduced representation sequence data from an experimental cross and association panels, (ii) using the initial variants to inform variant calling from whole-genome sequence data of resequenced individuals, and (iii) using variants identified from whole-genome sequence data for recalibration of the reduced representation sequence data. The reliability of variants called with the workflow is verified by comparison with genetically mappable variants from an independent sorghum experimental cross. Comparison with a recent sorghum resequencing study shows that the workflow identifies an additional 1.62 million high-confidence variants from the same sequence data. Finally, the workflow’s performance is validated using Arabidopsis sequence data, yielding variant call sets with 95% sensitivity and 99% positive predictive value. The Recalibration and Interrelation of genomic sequence data with the GATK (RIG) workflow enables the GATK to accurately identify genetic variation in organisms lacking validated variant resources. Genetics Society of America 2015-02-13 /pmc/articles/PMC4390580/ /pubmed/25681258 http://dx.doi.org/10.1534/g3.115.017012 Text en Copyright © 2015 McCormick et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
McCormick, Ryan F.
Truong, Sandra K.
Mullet, John E.
RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK
title RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK
title_full RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK
title_fullStr RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK
title_full_unstemmed RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK
title_short RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK
title_sort rig: recalibration and interrelation of genomic sequence data with the gatk
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4390580/
https://www.ncbi.nlm.nih.gov/pubmed/25681258
http://dx.doi.org/10.1534/g3.115.017012
work_keys_str_mv AT mccormickryanf rigrecalibrationandinterrelationofgenomicsequencedatawiththegatk
AT truongsandrak rigrecalibrationandinterrelationofgenomicsequencedatawiththegatk
AT mulletjohne rigrecalibrationandinterrelationofgenomicsequencedatawiththegatk