Cargando…

TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain

MOTIVATION: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by a...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Yan, Liu, Bo, Wang, Yadong, Xing, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612900/
https://www.ncbi.nlm.nih.gov/pubmed/31510677
http://dx.doi.org/10.1093/bioinformatics/btz376
_version_ 1783432961412038656
author Gao, Yan
Liu, Bo
Wang, Yadong
Xing, Yi
author_facet Gao, Yan
Liu, Bo
Wang, Yadong
Xing, Yi
author_sort Gao, Yan
collection PubMed
description MOTIVATION: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. RESULTS: We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy. AVAILABILITY AND IMPLEMENTATION: TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter
format Online
Article
Text
id pubmed-6612900
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66129002019-07-12 TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain Gao, Yan Liu, Bo Wang, Yadong Xing, Yi Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. RESULTS: We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy. AVAILABILITY AND IMPLEMENTATION: TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612900/ /pubmed/31510677 http://dx.doi.org/10.1093/bioinformatics/btz376 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Gao, Yan
Liu, Bo
Wang, Yadong
Xing, Yi
TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
title TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
title_full TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
title_fullStr TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
title_full_unstemmed TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
title_short TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
title_sort tidehunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612900/
https://www.ncbi.nlm.nih.gov/pubmed/31510677
http://dx.doi.org/10.1093/bioinformatics/btz376
work_keys_str_mv AT gaoyan tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain
AT liubo tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain
AT wangyadong tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain
AT xingyi tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain