Cargando…

Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art

Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, rea...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Justin, Mohamadi, Hamid, Warren, René L, Yang, Chen, Birol, Inanç
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408847/
https://www.ncbi.nlm.nih.gov/pubmed/28003261
http://dx.doi.org/10.1093/bioinformatics/btw811
_version_ 1783232377922781184
author Chu, Justin
Mohamadi, Hamid
Warren, René L
Yang, Chen
Birol, Inanç
author_facet Chu, Justin
Mohamadi, Hamid
Warren, René L
Yang, Chen
Birol, Inanç
author_sort Chu, Justin
collection PubMed
description Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. To supplement our review of the algorithms, we benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. In the versions of the tools tested, we observed that Minimap is the most computationally efficient, specific and sensitive method on the ONT datasets tested; whereas GraphMap and DALIGNER are the most specific and sensitive methods on the tested PB datasets. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing throughput. Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5408847
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54088472017-05-03 Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art Chu, Justin Mohamadi, Hamid Warren, René L Yang, Chen Birol, Inanç Bioinformatics Review Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. To supplement our review of the algorithms, we benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. In the versions of the tools tested, we observed that Minimap is the most computationally efficient, specific and sensitive method on the ONT datasets tested; whereas GraphMap and DALIGNER are the most specific and sensitive methods on the tested PB datasets. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing throughput. Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-04-15 2016-12-21 /pmc/articles/PMC5408847/ /pubmed/28003261 http://dx.doi.org/10.1093/bioinformatics/btw811 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Review
Chu, Justin
Mohamadi, Hamid
Warren, René L
Yang, Chen
Birol, Inanç
Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
title Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
title_full Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
title_fullStr Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
title_full_unstemmed Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
title_short Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
title_sort innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408847/
https://www.ncbi.nlm.nih.gov/pubmed/28003261
http://dx.doi.org/10.1093/bioinformatics/btw811
work_keys_str_mv AT chujustin innovationsandchallengesindetectinglongreadoverlapsanevaluationofthestateoftheart
AT mohamadihamid innovationsandchallengesindetectinglongreadoverlapsanevaluationofthestateoftheart
AT warrenrenel innovationsandchallengesindetectinglongreadoverlapsanevaluationofthestateoftheart
AT yangchen innovationsandchallengesindetectinglongreadoverlapsanevaluationofthestateoftheart
AT birolinanc innovationsandchallengesindetectinglongreadoverlapsanevaluationofthestateoftheart