Cargando…

Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs

MOTIVATION: With the current pace at which reference genomes are being produced, the availability of tools that can reliably and efficiently generate genome assembly summary statistics has become critical. Additionally, with the emergence of new algorithms and data types, tools that can improve the...

Descripción completa

Detalles Bibliográficos
Autores principales: Formenti, Giulio, Abueg, Linelle, Brajuka, Angelo, Brajuka, Nadolina, Gallardo-Alba, Cristóbal, Giani, Alice, Fedrigo, Olivier, Jarvis, Erich D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438950/
https://www.ncbi.nlm.nih.gov/pubmed/35799367
http://dx.doi.org/10.1093/bioinformatics/btac460
_version_ 1784781941324644352
author Formenti, Giulio
Abueg, Linelle
Brajuka, Angelo
Brajuka, Nadolina
Gallardo-Alba, Cristóbal
Giani, Alice
Fedrigo, Olivier
Jarvis, Erich D
author_facet Formenti, Giulio
Abueg, Linelle
Brajuka, Angelo
Brajuka, Nadolina
Gallardo-Alba, Cristóbal
Giani, Alice
Fedrigo, Olivier
Jarvis, Erich D
author_sort Formenti, Giulio
collection PubMed
description MOTIVATION: With the current pace at which reference genomes are being produced, the availability of tools that can reliably and efficiently generate genome assembly summary statistics has become critical. Additionally, with the emergence of new algorithms and data types, tools that can improve the quality of existing assemblies through automated and manual curation are required. RESULTS: We sought to address both these needs by developing gfastats, as part of the Vertebrate Genomes Project (VGP) effort to generate high-quality reference genomes at scale. Gfastats is a standalone tool to compute assembly summary statistics and manipulate assembly sequences in FASTA, FASTQ or GFA [.gz] format. Gfastats stores assembly sequences internally in a GFA-like format. This feature allows gfastats to seamlessly convert FAST* to and from GFA [.gz] files. Gfastats can also build an assembly graph that can in turn be used to manipulate the underlying sequences following instructions provided by the user, while simultaneously generating key metrics for the new sequences. AVAILABILITY AND IMPLEMENTATION: Gfastats is implemented in C++. Precompiled releases (Linux, MacOS, Windows) and commented source code for gfastats are available under MIT licence at https://github.com/vgl-hub/gfastats. Examples of how to run gfastats are provided in the GitHub. Gfastats is also available in Bioconda, in Galaxy (https://assembly.usegalaxy.eu) and as a MultiQC module (https://github.com/ewels/MultiQC). An automated test workflow is available to ensure consistency of software updates. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9438950
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94389502022-09-06 Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs Formenti, Giulio Abueg, Linelle Brajuka, Angelo Brajuka, Nadolina Gallardo-Alba, Cristóbal Giani, Alice Fedrigo, Olivier Jarvis, Erich D Bioinformatics Applications Note MOTIVATION: With the current pace at which reference genomes are being produced, the availability of tools that can reliably and efficiently generate genome assembly summary statistics has become critical. Additionally, with the emergence of new algorithms and data types, tools that can improve the quality of existing assemblies through automated and manual curation are required. RESULTS: We sought to address both these needs by developing gfastats, as part of the Vertebrate Genomes Project (VGP) effort to generate high-quality reference genomes at scale. Gfastats is a standalone tool to compute assembly summary statistics and manipulate assembly sequences in FASTA, FASTQ or GFA [.gz] format. Gfastats stores assembly sequences internally in a GFA-like format. This feature allows gfastats to seamlessly convert FAST* to and from GFA [.gz] files. Gfastats can also build an assembly graph that can in turn be used to manipulate the underlying sequences following instructions provided by the user, while simultaneously generating key metrics for the new sequences. AVAILABILITY AND IMPLEMENTATION: Gfastats is implemented in C++. Precompiled releases (Linux, MacOS, Windows) and commented source code for gfastats are available under MIT licence at https://github.com/vgl-hub/gfastats. Examples of how to run gfastats are provided in the GitHub. Gfastats is also available in Bioconda, in Galaxy (https://assembly.usegalaxy.eu) and as a MultiQC module (https://github.com/ewels/MultiQC). An automated test workflow is available to ensure consistency of software updates. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-07 /pmc/articles/PMC9438950/ /pubmed/35799367 http://dx.doi.org/10.1093/bioinformatics/btac460 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Formenti, Giulio
Abueg, Linelle
Brajuka, Angelo
Brajuka, Nadolina
Gallardo-Alba, Cristóbal
Giani, Alice
Fedrigo, Olivier
Jarvis, Erich D
Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
title Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
title_full Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
title_fullStr Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
title_full_unstemmed Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
title_short Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
title_sort gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438950/
https://www.ncbi.nlm.nih.gov/pubmed/35799367
http://dx.doi.org/10.1093/bioinformatics/btac460
work_keys_str_mv AT formentigiulio gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT abueglinelle gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT brajukaangelo gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT brajukanadolina gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT gallardoalbacristobal gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT gianialice gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT fedrigoolivier gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs
AT jarviserichd gfastatsconversionevaluationandmanipulationofgenomesequencesusingassemblygraphs