Cargando…

Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review

The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high...

Descripción completa

Detalles Bibliográficos
Autores principales: Kredens, Kelvin V., Martins, Juliano V., Dordal, Osmar B., Ferrandin, Mauri, Herai, Roberto H., Scalabrin, Edson E., Ávila, Bráulio C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250429/
https://www.ncbi.nlm.nih.gov/pubmed/32453750
http://dx.doi.org/10.1371/journal.pone.0232942
_version_ 1783538761897869312
author Kredens, Kelvin V.
Martins, Juliano V.
Dordal, Osmar B.
Ferrandin, Mauri
Herai, Roberto H.
Scalabrin, Edson E.
Ávila, Bráulio C.
author_facet Kredens, Kelvin V.
Martins, Juliano V.
Dordal, Osmar B.
Ferrandin, Mauri
Herai, Roberto H.
Scalabrin, Edson E.
Ávila, Bráulio C.
author_sort Kredens, Kelvin V.
collection PubMed
description The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there’s a lack of definition of an evaluation protocol that must be applied by each tool.
format Online
Article
Text
id pubmed-7250429
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-72504292020-06-08 Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review Kredens, Kelvin V. Martins, Juliano V. Dordal, Osmar B. Ferrandin, Mauri Herai, Roberto H. Scalabrin, Edson E. Ávila, Bráulio C. PLoS One Research Article The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there’s a lack of definition of an evaluation protocol that must be applied by each tool. Public Library of Science 2020-05-26 /pmc/articles/PMC7250429/ /pubmed/32453750 http://dx.doi.org/10.1371/journal.pone.0232942 Text en © 2020 Kredens et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kredens, Kelvin V.
Martins, Juliano V.
Dordal, Osmar B.
Ferrandin, Mauri
Herai, Roberto H.
Scalabrin, Edson E.
Ávila, Bráulio C.
Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review
title Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review
title_full Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review
title_fullStr Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review
title_full_unstemmed Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review
title_short Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review
title_sort vertical lossless genomic data compression tools for assembled genomes: a systematic literature review
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250429/
https://www.ncbi.nlm.nih.gov/pubmed/32453750
http://dx.doi.org/10.1371/journal.pone.0232942
work_keys_str_mv AT kredenskelvinv verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT martinsjulianov verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT dordalosmarb verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT ferrandinmauri verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT herairobertoh verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT scalabrinedsone verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT avilabraulioc verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview