Cargando…

Misassembly detection using paired-end sequence reads and optical mapping data

Motivation: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical...

Descripción completa

Detalles Bibliográficos
Autores principales: Muggli, Martin D., Puglisi, Simon J., Ronen, Roy, Boucher, Christina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542784/
https://www.ncbi.nlm.nih.gov/pubmed/26072512
http://dx.doi.org/10.1093/bioinformatics/btv262
_version_ 1782386562514812928
author Muggli, Martin D.
Puglisi, Simon J.
Ronen, Roy
Boucher, Christina
author_facet Muggli, Martin D.
Puglisi, Simon J.
Ronen, Roy
Boucher, Christina
author_sort Muggli, Martin D.
collection PubMed
description Motivation: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source computational methods for analyzing optical mapping data. We apply our method to various assemblies of the loblolly pine, Francisella tularensis, rice and budgerigar genomes. We generated and used stimulated optical mapping data for loblolly pine and F.tularensis and used real optical mapping data for rice and budgerigar. Results: Our results demonstrate that we detect more than 54% of extensively misassembled contigs and more than 60% of locally misassembled contigs in assemblies of F.tularensis and between 31% and 100% of extensively misassembled contigs and between 57% and 73% of locally misassembled contigs in assemblies of loblolly pine. Using the real optical mapping data, we correctly identified 75% of extensively misassembled contigs and 100% of locally misassembled contigs in rice, and 77% of extensively misassembled contigs and 80% of locally misassembled contigs in budgerigar. Availability and implementation: misSEQuel can be used as a post-processing step in combination with any genome assembler and is freely available at http://www.cs.colostate.edu/seq/. Contact: muggli@cs.colostate.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4542784
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45427842015-08-25 Misassembly detection using paired-end sequence reads and optical mapping data Muggli, Martin D. Puglisi, Simon J. Ronen, Roy Boucher, Christina Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Motivation: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source computational methods for analyzing optical mapping data. We apply our method to various assemblies of the loblolly pine, Francisella tularensis, rice and budgerigar genomes. We generated and used stimulated optical mapping data for loblolly pine and F.tularensis and used real optical mapping data for rice and budgerigar. Results: Our results demonstrate that we detect more than 54% of extensively misassembled contigs and more than 60% of locally misassembled contigs in assemblies of F.tularensis and between 31% and 100% of extensively misassembled contigs and between 57% and 73% of locally misassembled contigs in assemblies of loblolly pine. Using the real optical mapping data, we correctly identified 75% of extensively misassembled contigs and 100% of locally misassembled contigs in rice, and 77% of extensively misassembled contigs and 80% of locally misassembled contigs in budgerigar. Availability and implementation: misSEQuel can be used as a post-processing step in combination with any genome assembler and is freely available at http://www.cs.colostate.edu/seq/. Contact: muggli@cs.colostate.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4542784/ /pubmed/26072512 http://dx.doi.org/10.1093/bioinformatics/btv262 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0/),which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
Muggli, Martin D.
Puglisi, Simon J.
Ronen, Roy
Boucher, Christina
Misassembly detection using paired-end sequence reads and optical mapping data
title Misassembly detection using paired-end sequence reads and optical mapping data
title_full Misassembly detection using paired-end sequence reads and optical mapping data
title_fullStr Misassembly detection using paired-end sequence reads and optical mapping data
title_full_unstemmed Misassembly detection using paired-end sequence reads and optical mapping data
title_short Misassembly detection using paired-end sequence reads and optical mapping data
title_sort misassembly detection using paired-end sequence reads and optical mapping data
topic Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542784/
https://www.ncbi.nlm.nih.gov/pubmed/26072512
http://dx.doi.org/10.1093/bioinformatics/btv262
work_keys_str_mv AT mugglimartind misassemblydetectionusingpairedendsequencereadsandopticalmappingdata
AT puglisisimonj misassemblydetectionusingpairedendsequencereadsandopticalmappingdata
AT ronenroy misassemblydetectionusingpairedendsequencereadsandopticalmappingdata
AT boucherchristina misassemblydetectionusingpairedendsequencereadsandopticalmappingdata