Cargando…
TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
MOTIVATION: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612900/ https://www.ncbi.nlm.nih.gov/pubmed/31510677 http://dx.doi.org/10.1093/bioinformatics/btz376 |
_version_ | 1783432961412038656 |
---|---|
author | Gao, Yan Liu, Bo Wang, Yadong Xing, Yi |
author_facet | Gao, Yan Liu, Bo Wang, Yadong Xing, Yi |
author_sort | Gao, Yan |
collection | PubMed |
description | MOTIVATION: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. RESULTS: We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy. AVAILABILITY AND IMPLEMENTATION: TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter |
format | Online Article Text |
id | pubmed-6612900 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-66129002019-07-12 TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain Gao, Yan Liu, Bo Wang, Yadong Xing, Yi Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. RESULTS: We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy. AVAILABILITY AND IMPLEMENTATION: TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612900/ /pubmed/31510677 http://dx.doi.org/10.1093/bioinformatics/btz376 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2019 Conference Proceedings Gao, Yan Liu, Bo Wang, Yadong Xing, Yi TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
title | TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
title_full | TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
title_fullStr | TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
title_full_unstemmed | TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
title_short | TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
title_sort | tidehunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain |
topic | Ismb/Eccb 2019 Conference Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612900/ https://www.ncbi.nlm.nih.gov/pubmed/31510677 http://dx.doi.org/10.1093/bioinformatics/btz376 |
work_keys_str_mv | AT gaoyan tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain AT liubo tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain AT wangyadong tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain AT xingyi tidehunterefficientandsensitivetandemrepeatdetectionfromnoisylongreadsusingseedandchain |