Cargando…

Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)

BACKGROUND: De novo assembling of large genomes, such as in conifers (~ 12–30 Gbp), which also consist of ~ 80% of repetitive DNA, is a very complex and computationally intense endeavor. One of the main problems in assembling such genomes lays in computing limitations of nucleotide sequence assembly...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuzmin, Dmitry A., Feranchuk, Sergey I., Sharov, Vadim V., Cybin, Alexander N., Makolov, Stepan V., Putintseva, Yuliya A., Oreshkova, Natalya V., Krutovsky, Konstantin V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6362582/
https://www.ncbi.nlm.nih.gov/pubmed/30717661
http://dx.doi.org/10.1186/s12859-018-2570-y
_version_ 1783392949077278720
author Kuzmin, Dmitry A.
Feranchuk, Sergey I.
Sharov, Vadim V.
Cybin, Alexander N.
Makolov, Stepan V.
Putintseva, Yuliya A.
Oreshkova, Natalya V.
Krutovsky, Konstantin V.
author_facet Kuzmin, Dmitry A.
Feranchuk, Sergey I.
Sharov, Vadim V.
Cybin, Alexander N.
Makolov, Stepan V.
Putintseva, Yuliya A.
Oreshkova, Natalya V.
Krutovsky, Konstantin V.
author_sort Kuzmin, Dmitry A.
collection PubMed
description BACKGROUND: De novo assembling of large genomes, such as in conifers (~ 12–30 Gbp), which also consist of ~ 80% of repetitive DNA, is a very complex and computationally intense endeavor. One of the main problems in assembling such genomes lays in computing limitations of nucleotide sequence assembly programs (DNA assemblers). As a rule, modern assemblers are usually designed to assemble genomes with a length not exceeding the length of the human genome (3.24 Gbp). Most assemblers cannot handle the amount of input sequence data required to provide sufficient coverage needed for a high-quality assembly. RESULTS: An original stepwise method of de novo assembly by parts (sets), which allows to bypass the limitations of modern assemblers associated with a huge amount of data being processed, is presented in this paper. The results of numerical assembling experiments conducted using the model plant Arabidopsis thaliana, Prunus persica (peach) and four most popular assemblers, ABySS, SOAPdenovo, SPAdes, and CLC Assembly Cell, showed the validity and effectiveness of the proposed stepwise assembling method. CONCLUSION: Using the new stepwise de novo assembling method presented in the paper, the genome of Siberian larch, Larix sibirica Ledeb. (12.34 Gbp) was completely assembled de novo by the CLC Assembly Cell assembler. It is the first genome assembly for larch species in addition to only five other conifer genomes sequenced and assembled for Picea abies, Picea glauca, Pinus taeda, Pinus lambertiana, and Pseudotsuga menziesii var. menziesii. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2570-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6362582
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63625822019-02-14 Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb) Kuzmin, Dmitry A. Feranchuk, Sergey I. Sharov, Vadim V. Cybin, Alexander N. Makolov, Stepan V. Putintseva, Yuliya A. Oreshkova, Natalya V. Krutovsky, Konstantin V. BMC Bioinformatics Research BACKGROUND: De novo assembling of large genomes, such as in conifers (~ 12–30 Gbp), which also consist of ~ 80% of repetitive DNA, is a very complex and computationally intense endeavor. One of the main problems in assembling such genomes lays in computing limitations of nucleotide sequence assembly programs (DNA assemblers). As a rule, modern assemblers are usually designed to assemble genomes with a length not exceeding the length of the human genome (3.24 Gbp). Most assemblers cannot handle the amount of input sequence data required to provide sufficient coverage needed for a high-quality assembly. RESULTS: An original stepwise method of de novo assembly by parts (sets), which allows to bypass the limitations of modern assemblers associated with a huge amount of data being processed, is presented in this paper. The results of numerical assembling experiments conducted using the model plant Arabidopsis thaliana, Prunus persica (peach) and four most popular assemblers, ABySS, SOAPdenovo, SPAdes, and CLC Assembly Cell, showed the validity and effectiveness of the proposed stepwise assembling method. CONCLUSION: Using the new stepwise de novo assembling method presented in the paper, the genome of Siberian larch, Larix sibirica Ledeb. (12.34 Gbp) was completely assembled de novo by the CLC Assembly Cell assembler. It is the first genome assembly for larch species in addition to only five other conifer genomes sequenced and assembled for Picea abies, Picea glauca, Pinus taeda, Pinus lambertiana, and Pseudotsuga menziesii var. menziesii. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2570-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-05 /pmc/articles/PMC6362582/ /pubmed/30717661 http://dx.doi.org/10.1186/s12859-018-2570-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kuzmin, Dmitry A.
Feranchuk, Sergey I.
Sharov, Vadim V.
Cybin, Alexander N.
Makolov, Stepan V.
Putintseva, Yuliya A.
Oreshkova, Natalya V.
Krutovsky, Konstantin V.
Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)
title Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)
title_full Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)
title_fullStr Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)
title_full_unstemmed Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)
title_short Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb)
title_sort stepwise large genome assembly approach: a case of siberian larch (larix sibirica ledeb)
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6362582/
https://www.ncbi.nlm.nih.gov/pubmed/30717661
http://dx.doi.org/10.1186/s12859-018-2570-y
work_keys_str_mv AT kuzmindmitrya stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT feranchuksergeyi stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT sharovvadimv stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT cybinalexandern stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT makolovstepanv stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT putintsevayuliyaa stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT oreshkovanatalyav stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb
AT krutovskykonstantinv stepwiselargegenomeassemblyapproachacaseofsiberianlarchlarixsibiricaledeb