Cargando…

Variation graph toolkit improves read mapping by representing genetic variation in the reference

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs a...

Descripción completa

Detalles Bibliográficos
Autores principales: Garrison, Erik, Sirén, Jouni, Novak, Adam M., Hickey, Glenn, Eizenga, Jordan M., Dawson, Eric T., Jones, William, Garg, Shilpa, Markello, Charles, Lin, Michael F., Paten, Benedict, Durbin, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6126949/
https://www.ncbi.nlm.nih.gov/pubmed/30125266
http://dx.doi.org/10.1038/nbt.4227
_version_ 1783353392729423872
author Garrison, Erik
Sirén, Jouni
Novak, Adam M.
Hickey, Glenn
Eizenga, Jordan M.
Dawson, Eric T.
Jones, William
Garg, Shilpa
Markello, Charles
Lin, Michael F.
Paten, Benedict
Durbin, Richard
author_facet Garrison, Erik
Sirén, Jouni
Novak, Adam M.
Hickey, Glenn
Eizenga, Jordan M.
Dawson, Eric T.
Jones, William
Garg, Shilpa
Markello, Charles
Lin, Michael F.
Paten, Benedict
Durbin, Richard
author_sort Garrison, Erik
collection PubMed
description Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large scale structural variation such as inversions and duplications(1). Previous graph genome software implementations(2–4) have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and utilizing these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalised compressed suffix arrays(5), with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at gigabase scale, or at the topological complexity of de novo assemblies.
format Online
Article
Text
id pubmed-6126949
institution National Center for Biotechnology Information
language English
publishDate 2018
record_format MEDLINE/PubMed
spelling pubmed-61269492019-02-20 Variation graph toolkit improves read mapping by representing genetic variation in the reference Garrison, Erik Sirén, Jouni Novak, Adam M. Hickey, Glenn Eizenga, Jordan M. Dawson, Eric T. Jones, William Garg, Shilpa Markello, Charles Lin, Michael F. Paten, Benedict Durbin, Richard Nat Biotechnol Article Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large scale structural variation such as inversions and duplications(1). Previous graph genome software implementations(2–4) have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and utilizing these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalised compressed suffix arrays(5), with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at gigabase scale, or at the topological complexity of de novo assemblies. 2018-08-20 2018-10 /pmc/articles/PMC6126949/ /pubmed/30125266 http://dx.doi.org/10.1038/nbt.4227 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Garrison, Erik
Sirén, Jouni
Novak, Adam M.
Hickey, Glenn
Eizenga, Jordan M.
Dawson, Eric T.
Jones, William
Garg, Shilpa
Markello, Charles
Lin, Michael F.
Paten, Benedict
Durbin, Richard
Variation graph toolkit improves read mapping by representing genetic variation in the reference
title Variation graph toolkit improves read mapping by representing genetic variation in the reference
title_full Variation graph toolkit improves read mapping by representing genetic variation in the reference
title_fullStr Variation graph toolkit improves read mapping by representing genetic variation in the reference
title_full_unstemmed Variation graph toolkit improves read mapping by representing genetic variation in the reference
title_short Variation graph toolkit improves read mapping by representing genetic variation in the reference
title_sort variation graph toolkit improves read mapping by representing genetic variation in the reference
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6126949/
https://www.ncbi.nlm.nih.gov/pubmed/30125266
http://dx.doi.org/10.1038/nbt.4227
work_keys_str_mv AT garrisonerik variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT sirenjouni variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT novakadamm variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT hickeyglenn variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT eizengajordanm variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT dawsonerict variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT joneswilliam variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT gargshilpa variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT markellocharles variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT linmichaelf variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT patenbenedict variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference
AT durbinrichard variationgraphtoolkitimprovesreadmappingbyrepresentinggeneticvariationinthereference