Cargando…
De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application
BACKGROUND: Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which could be rep...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052550/ https://www.ncbi.nlm.nih.gov/pubmed/30021513 http://dx.doi.org/10.1186/s12859-018-2281-4 |
_version_ | 1783340677328797696 |
---|---|
author | Kuśmirek, Wiktor Nowak, Robert |
author_facet | Kuśmirek, Wiktor Nowak, Robert |
author_sort | Kuśmirek, Wiktor |
collection | PubMed |
description | BACKGROUND: Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which could be repeated many times. However, some of the tandem repeats are longer than the distance between the paired-end tags. RESULTS: We present a new algorithm for de novo DNA assembly, which uses the relative frequency of reads to properly restore tandem repeats. The main advantage of the presented algorithm is that long tandem repeats, which are much longer than maximum reads length and the insert size of paired-end tags can be properly restored. Moreover, repetitive DNA regions covered only by single-read sequencing data could also be restored. Other existing de novo DNA assemblers fail in such cases. The presented application is composed of several steps, including: (i) building the de Bruijn graph, (ii) correcting the de Bruijn graph, (iii) normalizing edge weights, and (iv) generating the output set of DNA sequences. We tested our approach on real data sets of bacterial organisms. CONCLUSIONS: The software library, console application and web application were developed. Web application was developed in client-server architecture, where web-browser is used to communicate with end-user and algorithms are implemented in C++ and Python. The presented approach enables proper reconstruction of tandem repeats, which are longer than the insert size of paired-end tags. The application is freely available to all users under GNU Library or Lesser General Public License version 3.0 (LGPLv3). |
format | Online Article Text |
id | pubmed-6052550 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-60525502018-07-20 De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application Kuśmirek, Wiktor Nowak, Robert BMC Bioinformatics Software BACKGROUND: Many organisms, in particular bacteria, contain repetitive DNA fragments called tandem repeats. These structures are restored by DNA assemblers by mapping paired-end tags to unitigs, estimating the distance between them and filling the gap with the specified DNA motif, which could be repeated many times. However, some of the tandem repeats are longer than the distance between the paired-end tags. RESULTS: We present a new algorithm for de novo DNA assembly, which uses the relative frequency of reads to properly restore tandem repeats. The main advantage of the presented algorithm is that long tandem repeats, which are much longer than maximum reads length and the insert size of paired-end tags can be properly restored. Moreover, repetitive DNA regions covered only by single-read sequencing data could also be restored. Other existing de novo DNA assemblers fail in such cases. The presented application is composed of several steps, including: (i) building the de Bruijn graph, (ii) correcting the de Bruijn graph, (iii) normalizing edge weights, and (iv) generating the output set of DNA sequences. We tested our approach on real data sets of bacterial organisms. CONCLUSIONS: The software library, console application and web application were developed. Web application was developed in client-server architecture, where web-browser is used to communicate with end-user and algorithms are implemented in C++ and Python. The presented approach enables proper reconstruction of tandem repeats, which are longer than the insert size of paired-end tags. The application is freely available to all users under GNU Library or Lesser General Public License version 3.0 (LGPLv3). BioMed Central 2018-07-18 /pmc/articles/PMC6052550/ /pubmed/30021513 http://dx.doi.org/10.1186/s12859-018-2281-4 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Kuśmirek, Wiktor Nowak, Robert De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application |
title | De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application |
title_full | De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application |
title_fullStr | De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application |
title_full_unstemmed | De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application |
title_short | De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application |
title_sort | de novo assembly of bacterial genomes with repetitive dna regions by dnaasm application |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052550/ https://www.ncbi.nlm.nih.gov/pubmed/30021513 http://dx.doi.org/10.1186/s12859-018-2281-4 |
work_keys_str_mv | AT kusmirekwiktor denovoassemblyofbacterialgenomeswithrepetitivednaregionsbydnaasmapplication AT nowakrobert denovoassemblyofbacterialgenomeswithrepetitivednaregionsbydnaasmapplication |