Cargando…

Distance indexing and seed clustering in sequence graphs

MOTIVATION: Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear ge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chang, Xian, Eizenga, Jordan, Novak, Adam M, Sirén, Jouni, Paten, Benedict
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Genomic Variation Analysis
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355256/ https://www.ncbi.nlm.nih.gov/pubmed/32657356 http://dx.doi.org/10.1093/bioinformatics/btaa446

_version_	1783558238322556928
author	Chang, Xian Eizenga, Jordan Novak, Adam M Sirén, Jouni Paten, Benedict
author_facet	Chang, Xian Eizenga, Jordan Novak, Adam M Sirén, Jouni Paten, Benedict
author_sort	Chang, Xian
collection	PubMed
description	MOTIVATION: Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. RESULTS: We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. AVAILABILITY AND IMPLEMENTATION: Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.
format	Online Article Text
id	pubmed-7355256
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73552562020-07-16 Distance indexing and seed clustering in sequence graphs Chang, Xian Eizenga, Jordan Novak, Adam M Sirén, Jouni Paten, Benedict Bioinformatics Genomic Variation Analysis MOTIVATION: Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. RESULTS: We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. AVAILABILITY AND IMPLEMENTATION: Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355256/ /pubmed/32657356 http://dx.doi.org/10.1093/bioinformatics/btaa446 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Genomic Variation Analysis Chang, Xian Eizenga, Jordan Novak, Adam M Sirén, Jouni Paten, Benedict Distance indexing and seed clustering in sequence graphs
title	Distance indexing and seed clustering in sequence graphs
title_full	Distance indexing and seed clustering in sequence graphs
title_fullStr	Distance indexing and seed clustering in sequence graphs
title_full_unstemmed	Distance indexing and seed clustering in sequence graphs
title_short	Distance indexing and seed clustering in sequence graphs
title_sort	distance indexing and seed clustering in sequence graphs
topic	Genomic Variation Analysis
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355256/ https://www.ncbi.nlm.nih.gov/pubmed/32657356 http://dx.doi.org/10.1093/bioinformatics/btaa446
work_keys_str_mv	AT changxian distanceindexingandseedclusteringinsequencegraphs AT eizengajordan distanceindexingandseedclusteringinsequencegraphs AT novakadamm distanceindexingandseedclusteringinsequencegraphs AT sirenjouni distanceindexingandseedclusteringinsequencegraphs AT patenbenedict distanceindexingandseedclusteringinsequencegraphs

Distance indexing and seed clustering in sequence graphs

Ejemplares similares