Cargando…

GenGraph: a python module for the simple generation and manipulation of genome graphs

BACKGROUND: As sequencing technology improves, the concept of a single reference genome is becoming increasingly restricting. In the case of Mycobacterium tuberculosis, one must often choose between using a genome that is closely related to the isolate, or one that is annotated in detail. One promis...

Descripción completa

Detalles Bibliográficos
Autores principales: Ambler, Jon Mitchell, Mulaudzi, Shandukani, Mulder, Nicola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894214/
https://www.ncbi.nlm.nih.gov/pubmed/31653197
http://dx.doi.org/10.1186/s12859-019-3115-8
_version_ 1783476343977017344
author Ambler, Jon Mitchell
Mulaudzi, Shandukani
Mulder, Nicola
author_facet Ambler, Jon Mitchell
Mulaudzi, Shandukani
Mulder, Nicola
author_sort Ambler, Jon Mitchell
collection PubMed
description BACKGROUND: As sequencing technology improves, the concept of a single reference genome is becoming increasingly restricting. In the case of Mycobacterium tuberculosis, one must often choose between using a genome that is closely related to the isolate, or one that is annotated in detail. One promising solution to this problem is through the graph based representation of collections of genomes as a single genome graph. Though there are currently a handful of tools that can create genome graphs and have demonstrated the advantages of this new paradigm, there still exists a need for flexible tools that can be used by researchers to overcome challenges in genomics studies. RESULTS: We present GenGraph, a Python toolkit and accompanying modules that use existing multiple sequence alignment tools to create genome graphs. Python is one of the most popular coding languages for the biological sciences, and by providing these tools, GenGraph makes it easier to experiment and develop new tools that utilise genome graphs. The conceptual model used is highly intuitive, and as much as possible the graph structure represents the biological relationship between the genomes. This design means that users will quickly be able to start creating genome graphs and using them in their own projects. We outline the methods used in the generation of the graphs, and give some examples of how the created graphs may be used. GenGraph utilises existing file formats and methods in the generation of these graphs, allowing graphs to be visualised and imported with widely used applications, including Cytoscape, R, and Java Script. CONCLUSIONS: GenGraph provides a set of tools for generating graph based representations of sets of sequences with a simple conceptual model, written in the widely used coding language Python, and publicly available on Github. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3115-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6894214
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68942142019-12-11 GenGraph: a python module for the simple generation and manipulation of genome graphs Ambler, Jon Mitchell Mulaudzi, Shandukani Mulder, Nicola BMC Bioinformatics Software BACKGROUND: As sequencing technology improves, the concept of a single reference genome is becoming increasingly restricting. In the case of Mycobacterium tuberculosis, one must often choose between using a genome that is closely related to the isolate, or one that is annotated in detail. One promising solution to this problem is through the graph based representation of collections of genomes as a single genome graph. Though there are currently a handful of tools that can create genome graphs and have demonstrated the advantages of this new paradigm, there still exists a need for flexible tools that can be used by researchers to overcome challenges in genomics studies. RESULTS: We present GenGraph, a Python toolkit and accompanying modules that use existing multiple sequence alignment tools to create genome graphs. Python is one of the most popular coding languages for the biological sciences, and by providing these tools, GenGraph makes it easier to experiment and develop new tools that utilise genome graphs. The conceptual model used is highly intuitive, and as much as possible the graph structure represents the biological relationship between the genomes. This design means that users will quickly be able to start creating genome graphs and using them in their own projects. We outline the methods used in the generation of the graphs, and give some examples of how the created graphs may be used. GenGraph utilises existing file formats and methods in the generation of these graphs, allowing graphs to be visualised and imported with widely used applications, including Cytoscape, R, and Java Script. CONCLUSIONS: GenGraph provides a set of tools for generating graph based representations of sets of sequences with a simple conceptual model, written in the widely used coding language Python, and publicly available on Github. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3115-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-10-25 /pmc/articles/PMC6894214/ /pubmed/31653197 http://dx.doi.org/10.1186/s12859-019-3115-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Ambler, Jon Mitchell
Mulaudzi, Shandukani
Mulder, Nicola
GenGraph: a python module for the simple generation and manipulation of genome graphs
title GenGraph: a python module for the simple generation and manipulation of genome graphs
title_full GenGraph: a python module for the simple generation and manipulation of genome graphs
title_fullStr GenGraph: a python module for the simple generation and manipulation of genome graphs
title_full_unstemmed GenGraph: a python module for the simple generation and manipulation of genome graphs
title_short GenGraph: a python module for the simple generation and manipulation of genome graphs
title_sort gengraph: a python module for the simple generation and manipulation of genome graphs
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894214/
https://www.ncbi.nlm.nih.gov/pubmed/31653197
http://dx.doi.org/10.1186/s12859-019-3115-8
work_keys_str_mv AT amblerjonmitchell gengraphapythonmoduleforthesimplegenerationandmanipulationofgenomegraphs
AT mulaudzishandukani gengraphapythonmoduleforthesimplegenerationandmanipulationofgenomegraphs
AT muldernicola gengraphapythonmoduleforthesimplegenerationandmanipulationofgenomegraphs