Cargando…

MUMmer4: A fast and versatile genome alignment system

The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads...

Descripción completa

Detalles Bibliográficos
Autores principales: Marçais, Guillaume, Delcher, Arthur L., Phillippy, Adam M., Coston, Rachel, Salzberg, Steven L., Zimin, Aleksey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802927/
https://www.ncbi.nlm.nih.gov/pubmed/29373581
http://dx.doi.org/10.1371/journal.pcbi.1005944
_version_ 1783298616720359424
author Marçais, Guillaume
Delcher, Arthur L.
Phillippy, Adam M.
Coston, Rachel
Salzberg, Steven L.
Zimin, Aleksey
author_facet Marçais, Guillaume
Delcher, Arthur L.
Phillippy, Adam M.
Coston, Rachel
Salzberg, Steven L.
Zimin, Aleksey
author_sort Marçais, Guillaume
collection PubMed
description The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.
format Online
Article
Text
id pubmed-5802927
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58029272018-02-23 MUMmer4: A fast and versatile genome alignment system Marçais, Guillaume Delcher, Arthur L. Phillippy, Adam M. Coston, Rachel Salzberg, Steven L. Zimin, Aleksey PLoS Comput Biol Research Article The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available. Public Library of Science 2018-01-26 /pmc/articles/PMC5802927/ /pubmed/29373581 http://dx.doi.org/10.1371/journal.pcbi.1005944 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Marçais, Guillaume
Delcher, Arthur L.
Phillippy, Adam M.
Coston, Rachel
Salzberg, Steven L.
Zimin, Aleksey
MUMmer4: A fast and versatile genome alignment system
title MUMmer4: A fast and versatile genome alignment system
title_full MUMmer4: A fast and versatile genome alignment system
title_fullStr MUMmer4: A fast and versatile genome alignment system
title_full_unstemmed MUMmer4: A fast and versatile genome alignment system
title_short MUMmer4: A fast and versatile genome alignment system
title_sort mummer4: a fast and versatile genome alignment system
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802927/
https://www.ncbi.nlm.nih.gov/pubmed/29373581
http://dx.doi.org/10.1371/journal.pcbi.1005944
work_keys_str_mv AT marcaisguillaume mummer4afastandversatilegenomealignmentsystem
AT delcherarthurl mummer4afastandversatilegenomealignmentsystem
AT phillippyadamm mummer4afastandversatilegenomealignmentsystem
AT costonrachel mummer4afastandversatilegenomealignmentsystem
AT salzbergstevenl mummer4afastandversatilegenomealignmentsystem
AT ziminaleksey mummer4afastandversatilegenomealignmentsystem