Cargando…

Algorithms designed for compressed-gene-data transformation among gene banks with different references

BACKGROUND: With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today. How to store, transmit and analyze these data has become a hotspot in the current research. Now the...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Qiuming, Guo, Chao, Zhang, Yi Jun, Cai, Ye, Liu, Gang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006589/
https://www.ncbi.nlm.nih.gov/pubmed/29914357
http://dx.doi.org/10.1186/s12859-018-2230-2
_version_ 1783332866367684608
author Luo, Qiuming
Guo, Chao
Zhang, Yi Jun
Cai, Ye
Liu, Gang
author_facet Luo, Qiuming
Guo, Chao
Zhang, Yi Jun
Cai, Ye
Liu, Gang
author_sort Luo, Qiuming
collection PubMed
description BACKGROUND: With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today. How to store, transmit and analyze these data has become a hotspot in the current research. Now the compression algorithm based on reference is widely used due to its high compression ratio. There exists a big problem that the data from different gene banks can’t merge directly and share information efficiently, because these data are usually compressed with different references. The traditional workflow is decompression-and-recompression, which is too simple and time-consuming. We should improve it and speed it up. RESULTS: In this paper, we focus on this problem and propose a set of transformation algorithms to cope with it. We will 1) analyze some different compression algorithms to find the similarities and the differences among all of them, 2) come up with a naïve method named TDM for data transformation between difference gene banks and finally 3) optimize former method TDM and propose the method named TPI and the method named TGI. A number of experiment result proved that the three algorithms we proposed are an order of magnitude faster than traditional decompression-and-recompression workflow. CONCLUSIONS: Firstly, the three algorithms we proposed all have good performance in terms of time. Secondly, they have their own different advantages faced with different dataset or situations. TDM and TPI are more suitable for small-scale gene data transformation, while TGI is more suitable for large-scale gene data transformation.
format Online
Article
Text
id pubmed-6006589
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60065892018-06-26 Algorithms designed for compressed-gene-data transformation among gene banks with different references Luo, Qiuming Guo, Chao Zhang, Yi Jun Cai, Ye Liu, Gang BMC Bioinformatics Methodology Article BACKGROUND: With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today. How to store, transmit and analyze these data has become a hotspot in the current research. Now the compression algorithm based on reference is widely used due to its high compression ratio. There exists a big problem that the data from different gene banks can’t merge directly and share information efficiently, because these data are usually compressed with different references. The traditional workflow is decompression-and-recompression, which is too simple and time-consuming. We should improve it and speed it up. RESULTS: In this paper, we focus on this problem and propose a set of transformation algorithms to cope with it. We will 1) analyze some different compression algorithms to find the similarities and the differences among all of them, 2) come up with a naïve method named TDM for data transformation between difference gene banks and finally 3) optimize former method TDM and propose the method named TPI and the method named TGI. A number of experiment result proved that the three algorithms we proposed are an order of magnitude faster than traditional decompression-and-recompression workflow. CONCLUSIONS: Firstly, the three algorithms we proposed all have good performance in terms of time. Secondly, they have their own different advantages faced with different dataset or situations. TDM and TPI are more suitable for small-scale gene data transformation, while TGI is more suitable for large-scale gene data transformation. BioMed Central 2018-06-18 /pmc/articles/PMC6006589/ /pubmed/29914357 http://dx.doi.org/10.1186/s12859-018-2230-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Luo, Qiuming
Guo, Chao
Zhang, Yi Jun
Cai, Ye
Liu, Gang
Algorithms designed for compressed-gene-data transformation among gene banks with different references
title Algorithms designed for compressed-gene-data transformation among gene banks with different references
title_full Algorithms designed for compressed-gene-data transformation among gene banks with different references
title_fullStr Algorithms designed for compressed-gene-data transformation among gene banks with different references
title_full_unstemmed Algorithms designed for compressed-gene-data transformation among gene banks with different references
title_short Algorithms designed for compressed-gene-data transformation among gene banks with different references
title_sort algorithms designed for compressed-gene-data transformation among gene banks with different references
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006589/
https://www.ncbi.nlm.nih.gov/pubmed/29914357
http://dx.doi.org/10.1186/s12859-018-2230-2
work_keys_str_mv AT luoqiuming algorithmsdesignedforcompressedgenedatatransformationamonggenebankswithdifferentreferences
AT guochao algorithmsdesignedforcompressedgenedatatransformationamonggenebankswithdifferentreferences
AT zhangyijun algorithmsdesignedforcompressedgenedatatransformationamonggenebankswithdifferentreferences
AT caiye algorithmsdesignedforcompressedgenedatatransformationamonggenebankswithdifferentreferences
AT liugang algorithmsdesignedforcompressedgenedatatransformationamonggenebankswithdifferentreferences