Cargando…
Variation graph toolkit improves read mapping by representing genetic variation in the reference
Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs a...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6126949/ https://www.ncbi.nlm.nih.gov/pubmed/30125266 http://dx.doi.org/10.1038/nbt.4227 |
_version_ | 1783353392729423872 |
---|---|
author | Garrison, Erik Sirén, Jouni Novak, Adam M. Hickey, Glenn Eizenga, Jordan M. Dawson, Eric T. Jones, William Garg, Shilpa Markello, Charles Lin, Michael F. Paten, Benedict Durbin, Richard |
author_facet | Garrison, Erik Sirén, Jouni Novak, Adam M. Hickey, Glenn Eizenga, Jordan M. Dawson, Eric T. Jones, William Garg, Shilpa Markello, Charles Lin, Michael F. Paten, Benedict Durbin, Richard |
author_sort | Garrison, Erik |
collection | PubMed |
description | Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large scale structural variation such as inversions and duplications(1). Previous graph genome software implementations(2–4) have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and utilizing these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalised compressed suffix arrays(5), with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at gigabase scale, or at the topological complexity of de novo assemblies. |
format | Online Article Text |
id | pubmed-6126949 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
record_format | MEDLINE/PubMed |
spelling | pubmed-61269492019-02-20 Variation graph toolkit improves read mapping by representing genetic variation in the reference Garrison, Erik Sirén, Jouni Novak, Adam M. Hickey, Glenn Eizenga, Jordan M. Dawson, Eric T. Jones, William Garg, Shilpa Markello, Charles Lin, Michael F. Paten, Benedict Durbin, Richard Nat Biotechnol Article Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large scale structural variation such as inversions and duplications(1). Previous graph genome software implementations(2–4) have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and utilizing these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalised compressed suffix arrays(5), with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at gigabase scale, or at the topological complexity of de novo assemblies. 2018-08-20 2018-10 /pmc/articles/PMC6126949/ /pubmed/30125266 http://dx.doi.org/10.1038/nbt.4227 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Garrison, Erik Sirén, Jouni Novak, Adam M. Hickey, Glenn Eizenga, Jordan M. Dawson, Eric T. Jones, William Garg, Shilpa Markello, Charles Lin, Michael F. Paten, Benedict Durbin, Richard Variation graph toolkit improves read mapping by representing genetic variation in the reference |
title | Variation graph toolkit improves read mapping by representing genetic variation in the reference |
title_full | Variation graph toolkit improves read mapping by representing genetic variation in the reference |
title_fullStr | Variation graph toolkit improves read mapping by representing genetic variation in the reference |
title_full_unstemmed | Variation graph toolkit improves read mapping by representing genetic variation in the reference |
title_short | Variation graph toolkit improves read mapping by representing genetic variation in the reference |
title_sort | variation graph toolkit improves read mapping by representing genetic variation in the reference |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6126949/ https://www.ncbi.nlm.nih.gov/pubmed/30125266 http://dx.doi.org/10.1038/nbt.4227 |
work_keys_str_mv | AT garrisonerik variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT sirenjouni variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT novakadamm variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT hickeyglenn variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT eizengajordanm variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT dawsonerict variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT joneswilliam variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT gargshilpa variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT markellocharles variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT linmichaelf variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT patenbenedict variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference AT durbinrichard variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference |