Cargando…

Error correcting optical mapping data

Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome . Recently it has been used for scaffolding contigs and for assembly validation for large-scale sequencing projects, including the...

Descripción completa

Detalles Bibliográficos
Autores principales: Mukherjee, Kingshuk, Washimkar, Darshan, Muggli, Martin D, Salmela, Leena, Boucher, Christina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007263/
https://www.ncbi.nlm.nih.gov/pubmed/29846578
http://dx.doi.org/10.1093/gigascience/giy061
_version_ 1783333003883184128
author Mukherjee, Kingshuk
Washimkar, Darshan
Muggli, Martin D
Salmela, Leena
Boucher, Christina
author_facet Mukherjee, Kingshuk
Washimkar, Darshan
Muggli, Martin D
Salmela, Leena
Boucher, Christina
author_sort Mukherjee, Kingshuk
collection PubMed
description Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome . Recently it has been used for scaffolding contigs and for assembly validation for large-scale sequencing projects, including the maize, goat, and Amborella genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data are numerical and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the Escherichia coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Last, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous and covers a larger fraction of the genome.
format Online
Article
Text
id pubmed-6007263
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60072632018-06-25 Error correcting optical mapping data Mukherjee, Kingshuk Washimkar, Darshan Muggli, Martin D Salmela, Leena Boucher, Christina Gigascience Technical Note Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome . Recently it has been used for scaffolding contigs and for assembly validation for large-scale sequencing projects, including the maize, goat, and Amborella genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data are numerical and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the Escherichia coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Last, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous and covers a larger fraction of the genome. Oxford University Press 2018-05-25 /pmc/articles/PMC6007263/ /pubmed/29846578 http://dx.doi.org/10.1093/gigascience/giy061 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Mukherjee, Kingshuk
Washimkar, Darshan
Muggli, Martin D
Salmela, Leena
Boucher, Christina
Error correcting optical mapping data
title Error correcting optical mapping data
title_full Error correcting optical mapping data
title_fullStr Error correcting optical mapping data
title_full_unstemmed Error correcting optical mapping data
title_short Error correcting optical mapping data
title_sort error correcting optical mapping data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007263/
https://www.ncbi.nlm.nih.gov/pubmed/29846578
http://dx.doi.org/10.1093/gigascience/giy061
work_keys_str_mv AT mukherjeekingshuk errorcorrectingopticalmappingdata
AT washimkardarshan errorcorrectingopticalmappingdata
AT mugglimartind errorcorrectingopticalmappingdata
AT salmelaleena errorcorrectingopticalmappingdata
AT boucherchristina errorcorrectingopticalmappingdata