Cargando…

Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer

BACKGROUND: The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers) are typi...

Descripción completa

Detalles Bibliográficos
Autores principales: Peterlongo, Pierre, Chikhi, Rayan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514201/
https://www.ncbi.nlm.nih.gov/pubmed/22443449
http://dx.doi.org/10.1186/1471-2105-13-48
_version_ 1782251987776045056
author Peterlongo, Pierre
Chikhi, Rayan
author_facet Peterlongo, Pierre
Chikhi, Rayan
author_sort Peterlongo, Pierre
collection PubMed
description BACKGROUND: The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers) are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly. RESULTS: We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known. CONCLUSIONS: Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/.
format Online
Article
Text
id pubmed-3514201
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35142012012-12-06 Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer Peterlongo, Pierre Chikhi, Rayan BMC Bioinformatics Research Article BACKGROUND: The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers) are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly. RESULTS: We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known. CONCLUSIONS: Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/. BioMed Central 2012-03-23 /pmc/articles/PMC3514201/ /pubmed/22443449 http://dx.doi.org/10.1186/1471-2105-13-48 Text en Copyright ©2012 Peterlongo and Chikhi; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Peterlongo, Pierre
Chikhi, Rayan
Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
title Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
title_full Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
title_fullStr Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
title_full_unstemmed Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
title_short Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
title_sort mapsembler, targeted and micro assembly of large ngs datasets on a desktop computer
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514201/
https://www.ncbi.nlm.nih.gov/pubmed/22443449
http://dx.doi.org/10.1186/1471-2105-13-48
work_keys_str_mv AT peterlongopierre mapsemblertargetedandmicroassemblyoflargengsdatasetsonadesktopcomputer
AT chikhirayan mapsemblertargetedandmicroassemblyoflargengsdatasetsonadesktopcomputer