Cargando…

Coordinates and intervals in graph-based reference genomes

BACKGROUND: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Rand, Knut D., Grytten, Ivar, Nederbragt, Alexander J., Storvik, Geir O., Glad, Ingrid K., Sandve, Geir K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437615/
https://www.ncbi.nlm.nih.gov/pubmed/28521770
http://dx.doi.org/10.1186/s12859-017-1678-9
_version_ 1783237623784931328
author Rand, Knut D.
Grytten, Ivar
Nederbragt, Alexander J.
Storvik, Geir O.
Glad, Ingrid K.
Sandve, Geir K.
author_facet Rand, Knut D.
Grytten, Ivar
Nederbragt, Alexander J.
Storvik, Geir O.
Glad, Ingrid K.
Sandve, Geir K.
author_sort Rand, Knut D.
collection PubMed
description BACKGROUND: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. RESULTS: We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. CONCLUSION: More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph. An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1678-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5437615
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54376152017-05-22 Coordinates and intervals in graph-based reference genomes Rand, Knut D. Grytten, Ivar Nederbragt, Alexander J. Storvik, Geir O. Glad, Ingrid K. Sandve, Geir K. BMC Bioinformatics Research Article BACKGROUND: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. RESULTS: We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. CONCLUSION: More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph. An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1678-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-18 /pmc/articles/PMC5437615/ /pubmed/28521770 http://dx.doi.org/10.1186/s12859-017-1678-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rand, Knut D.
Grytten, Ivar
Nederbragt, Alexander J.
Storvik, Geir O.
Glad, Ingrid K.
Sandve, Geir K.
Coordinates and intervals in graph-based reference genomes
title Coordinates and intervals in graph-based reference genomes
title_full Coordinates and intervals in graph-based reference genomes
title_fullStr Coordinates and intervals in graph-based reference genomes
title_full_unstemmed Coordinates and intervals in graph-based reference genomes
title_short Coordinates and intervals in graph-based reference genomes
title_sort coordinates and intervals in graph-based reference genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437615/
https://www.ncbi.nlm.nih.gov/pubmed/28521770
http://dx.doi.org/10.1186/s12859-017-1678-9
work_keys_str_mv AT randknutd coordinatesandintervalsingraphbasedreferencegenomes
AT gryttenivar coordinatesandintervalsingraphbasedreferencegenomes
AT nederbragtalexanderj coordinatesandintervalsingraphbasedreferencegenomes
AT storvikgeiro coordinatesandintervalsingraphbasedreferencegenomes
AT gladingridk coordinatesandintervalsingraphbasedreferencegenomes
AT sandvegeirk coordinatesandintervalsingraphbasedreferencegenomes