Cargando…

Hybrid de novo tandem repeat detection using short and long reads

BACKGROUND: As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where...

Descripción completa

Detalles Bibliográficos
Autores principales: Fertin, Guillaume, Jean, Géraldine, Radulescu, Andreea, Rusu, Irena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582210/
https://www.ncbi.nlm.nih.gov/pubmed/26399998
http://dx.doi.org/10.1186/1755-8794-8-S3-S5
_version_ 1782391666137628672
author Fertin, Guillaume
Jean, Géraldine
Radulescu, Andreea
Rusu, Irena
author_facet Fertin, Guillaume
Jean, Géraldine
Radulescu, Andreea
Rusu, Irena
author_sort Fertin, Guillaume
collection PubMed
description BACKGROUND: As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. METHODS: In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. RESULTS: MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. CONCLUSIONS: Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.
format Online
Article
Text
id pubmed-4582210
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45822102015-09-28 Hybrid de novo tandem repeat detection using short and long reads Fertin, Guillaume Jean, Géraldine Radulescu, Andreea Rusu, Irena BMC Med Genomics Research BACKGROUND: As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. METHODS: In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. RESULTS: MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. CONCLUSIONS: Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval. BioMed Central 2015-09-23 /pmc/articles/PMC4582210/ /pubmed/26399998 http://dx.doi.org/10.1186/1755-8794-8-S3-S5 Text en Copyright © 2015 Fertin et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Fertin, Guillaume
Jean, Géraldine
Radulescu, Andreea
Rusu, Irena
Hybrid de novo tandem repeat detection using short and long reads
title Hybrid de novo tandem repeat detection using short and long reads
title_full Hybrid de novo tandem repeat detection using short and long reads
title_fullStr Hybrid de novo tandem repeat detection using short and long reads
title_full_unstemmed Hybrid de novo tandem repeat detection using short and long reads
title_short Hybrid de novo tandem repeat detection using short and long reads
title_sort hybrid de novo tandem repeat detection using short and long reads
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582210/
https://www.ncbi.nlm.nih.gov/pubmed/26399998
http://dx.doi.org/10.1186/1755-8794-8-S3-S5
work_keys_str_mv AT fertinguillaume hybriddenovotandemrepeatdetectionusingshortandlongreads
AT jeangeraldine hybriddenovotandemrepeatdetectionusingshortandlongreads
AT radulescuandreea hybriddenovotandemrepeatdetectionusingshortandlongreads
AT rusuirena hybriddenovotandemrepeatdetectionusingshortandlongreads