Cargando…
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to...
Autores principales: | , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083463/ https://www.ncbi.nlm.nih.gov/pubmed/21478889 http://dx.doi.org/10.1038/ng.806 |
_version_ | 1782202405635489792 |
---|---|
author | DePristo, M.A. Banks, E. Poplin, R.E. Garimella, K.V. Maguire, J.R. Hartl, C. Philippakis, A.A. del Angel, G. Rivas, M.A Hanna, M. McKenna, A. Fennell, T.J. Kernytsky, A.M. Sivachenko, A.Y. Cibulskis, K. Gabriel, S.B. Altshuler, D. Daly, M.J. |
author_facet | DePristo, M.A. Banks, E. Poplin, R.E. Garimella, K.V. Maguire, J.R. Hartl, C. Philippakis, A.A. del Angel, G. Rivas, M.A Hanna, M. McKenna, A. Fennell, T.J. Kernytsky, A.M. Sivachenko, A.Y. Cibulskis, K. Gabriel, S.B. Altshuler, D. Daly, M.J. |
author_sort | DePristo, M.A. |
collection | PubMed |
description | Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets. |
format | Text |
id | pubmed-3083463 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
record_format | MEDLINE/PubMed |
spelling | pubmed-30834632011-11-01 A framework for variation discovery and genotyping using next-generation DNA sequencing data DePristo, M.A. Banks, E. Poplin, R.E. Garimella, K.V. Maguire, J.R. Hartl, C. Philippakis, A.A. del Angel, G. Rivas, M.A Hanna, M. McKenna, A. Fennell, T.J. Kernytsky, A.M. Sivachenko, A.Y. Cibulskis, K. Gabriel, S.B. Altshuler, D. Daly, M.J. Nat Genet Article Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets. 2011-04-10 2011-05 /pmc/articles/PMC3083463/ /pubmed/21478889 http://dx.doi.org/10.1038/ng.806 Text en Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article DePristo, M.A. Banks, E. Poplin, R.E. Garimella, K.V. Maguire, J.R. Hartl, C. Philippakis, A.A. del Angel, G. Rivas, M.A Hanna, M. McKenna, A. Fennell, T.J. Kernytsky, A.M. Sivachenko, A.Y. Cibulskis, K. Gabriel, S.B. Altshuler, D. Daly, M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data |
title | A framework for variation discovery and genotyping using next-generation DNA sequencing data |
title_full | A framework for variation discovery and genotyping using next-generation DNA sequencing data |
title_fullStr | A framework for variation discovery and genotyping using next-generation DNA sequencing data |
title_full_unstemmed | A framework for variation discovery and genotyping using next-generation DNA sequencing data |
title_short | A framework for variation discovery and genotyping using next-generation DNA sequencing data |
title_sort | framework for variation discovery and genotyping using next-generation dna sequencing data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083463/ https://www.ncbi.nlm.nih.gov/pubmed/21478889 http://dx.doi.org/10.1038/ng.806 |
work_keys_str_mv | AT depristoma aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT bankse aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT poplinre aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT garimellakv aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT maguirejr aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT hartlc aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT philippakisaa aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT delangelg aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT rivasma aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT hannam aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT mckennaa aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT fennelltj aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT kernytskyam aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT sivachenkoay aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT cibulskisk aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT gabrielsb aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT altshulerd aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT dalymj aframeworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT depristoma frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT bankse frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT poplinre frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT garimellakv frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT maguirejr frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT hartlc frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT philippakisaa frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT delangelg frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT rivasma frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT hannam frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT mckennaa frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT fennelltj frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT kernytskyam frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT sivachenkoay frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT cibulskisk frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT gabrielsb frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT altshulerd frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata AT dalymj frameworkforvariationdiscoveryandgenotypingusingnextgenerationdnasequencingdata |