Cargando…
A novel compression tool for efficient storage of genome resequencing data
With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genom...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074166/ https://www.ncbi.nlm.nih.gov/pubmed/21266471 http://dx.doi.org/10.1093/nar/gkr009 |
_version_ | 1782201699426893824 |
---|---|
author | Wang, Congmao Zhang, Dabing |
author_facet | Wang, Congmao Zhang, Dabing |
author_sort | Wang, Congmao |
collection | PubMed |
description | With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genome data, such as those of the human, mouse and rice, has become a challenge to biologists. Currently available bioinformatics tools used to compress genome sequence data have some limitations, such as the requirement of the reference single nucleotide polymorphisms (SNPs) map and information on deletions and insertions. Here, we present a novel compression tool for storing and analyzing Genome ReSequencing data, named GRS. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence. When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS. |
format | Text |
id | pubmed-3074166 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-30741662011-04-12 A novel compression tool for efficient storage of genome resequencing data Wang, Congmao Zhang, Dabing Nucleic Acids Res Methods Online With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genome data, such as those of the human, mouse and rice, has become a challenge to biologists. Currently available bioinformatics tools used to compress genome sequence data have some limitations, such as the requirement of the reference single nucleotide polymorphisms (SNPs) map and information on deletions and insertions. Here, we present a novel compression tool for storing and analyzing Genome ReSequencing data, named GRS. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence. When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS. Oxford University Press 2011-04 2011-01-25 /pmc/articles/PMC3074166/ /pubmed/21266471 http://dx.doi.org/10.1093/nar/gkr009 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Wang, Congmao Zhang, Dabing A novel compression tool for efficient storage of genome resequencing data |
title | A novel compression tool for efficient storage of genome resequencing data |
title_full | A novel compression tool for efficient storage of genome resequencing data |
title_fullStr | A novel compression tool for efficient storage of genome resequencing data |
title_full_unstemmed | A novel compression tool for efficient storage of genome resequencing data |
title_short | A novel compression tool for efficient storage of genome resequencing data |
title_sort | novel compression tool for efficient storage of genome resequencing data |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074166/ https://www.ncbi.nlm.nih.gov/pubmed/21266471 http://dx.doi.org/10.1093/nar/gkr009 |
work_keys_str_mv | AT wangcongmao anovelcompressiontoolforefficientstorageofgenomeresequencingdata AT zhangdabing anovelcompressiontoolforefficientstorageofgenomeresequencingdata AT wangcongmao novelcompressiontoolforefficientstorageofgenomeresequencingdata AT zhangdabing novelcompressiontoolforefficientstorageofgenomeresequencingdata |