Cargando…

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the sh...

Descripción completa

Detalles Bibliográficos
Autores principales:	Firtina, Can, Bar-Joseph, Ziv, Alkan, Can, Cicek, A Ercument
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Methods Online
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6265270/ https://www.ncbi.nlm.nih.gov/pubmed/30124947 http://dx.doi.org/10.1093/nar/gky724

_version_	1783375605295742976
author	Firtina, Can Bar-Joseph, Ziv Alkan, Can Cicek, A Ercument
author_facet	Firtina, Can Bar-Joseph, Ziv Alkan, Can Cicek, A Ercument
author_sort	Firtina, Can
collection	PubMed
description	Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.
format	Online Article Text
id	pubmed-6265270
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-62652702018-12-04 Hercules: a profile HMM-based hybrid error correction algorithm for long reads Firtina, Can Bar-Joseph, Ziv Alkan, Can Cicek, A Ercument Nucleic Acids Res Methods Online Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy. Oxford University Press 2018-11-30 2018-08-16 /pmc/articles/PMC6265270/ /pubmed/30124947 http://dx.doi.org/10.1093/nar/gky724 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Methods Online Firtina, Can Bar-Joseph, Ziv Alkan, Can Cicek, A Ercument Hercules: a profile HMM-based hybrid error correction algorithm for long reads
title	Hercules: a profile HMM-based hybrid error correction algorithm for long reads
title_full	Hercules: a profile HMM-based hybrid error correction algorithm for long reads
title_fullStr	Hercules: a profile HMM-based hybrid error correction algorithm for long reads
title_full_unstemmed	Hercules: a profile HMM-based hybrid error correction algorithm for long reads
title_short	Hercules: a profile HMM-based hybrid error correction algorithm for long reads
title_sort	hercules: a profile hmm-based hybrid error correction algorithm for long reads
topic	Methods Online
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6265270/ https://www.ncbi.nlm.nih.gov/pubmed/30124947 http://dx.doi.org/10.1093/nar/gky724
work_keys_str_mv	AT firtinacan herculesaprofilehmmbasedhybriderrorcorrectionalgorithmforlongreads AT barjosephziv herculesaprofilehmmbasedhybriderrorcorrectionalgorithmforlongreads AT alkancan herculesaprofilehmmbasedhybriderrorcorrectionalgorithmforlongreads AT cicekaercument herculesaprofilehmmbasedhybriderrorcorrectionalgorithmforlongreads

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Ejemplares similares