Cargando…
Hybrid de novo tandem repeat detection using short and long reads
BACKGROUND: As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582210/ https://www.ncbi.nlm.nih.gov/pubmed/26399998 http://dx.doi.org/10.1186/1755-8794-8-S3-S5 |
_version_ | 1782391666137628672 |
---|---|
author | Fertin, Guillaume Jean, Géraldine Radulescu, Andreea Rusu, Irena |
author_facet | Fertin, Guillaume Jean, Géraldine Radulescu, Andreea Rusu, Irena |
author_sort | Fertin, Guillaume |
collection | PubMed |
description | BACKGROUND: As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. METHODS: In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. RESULTS: MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. CONCLUSIONS: Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval. |
format | Online Article Text |
id | pubmed-4582210 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45822102015-09-28 Hybrid de novo tandem repeat detection using short and long reads Fertin, Guillaume Jean, Géraldine Radulescu, Andreea Rusu, Irena BMC Med Genomics Research BACKGROUND: As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. METHODS: In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. RESULTS: MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. CONCLUSIONS: Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval. BioMed Central 2015-09-23 /pmc/articles/PMC4582210/ /pubmed/26399998 http://dx.doi.org/10.1186/1755-8794-8-S3-S5 Text en Copyright © 2015 Fertin et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Fertin, Guillaume Jean, Géraldine Radulescu, Andreea Rusu, Irena Hybrid de novo tandem repeat detection using short and long reads |
title | Hybrid de novo tandem repeat detection using short and long reads |
title_full | Hybrid de novo tandem repeat detection using short and long reads |
title_fullStr | Hybrid de novo tandem repeat detection using short and long reads |
title_full_unstemmed | Hybrid de novo tandem repeat detection using short and long reads |
title_short | Hybrid de novo tandem repeat detection using short and long reads |
title_sort | hybrid de novo tandem repeat detection using short and long reads |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582210/ https://www.ncbi.nlm.nih.gov/pubmed/26399998 http://dx.doi.org/10.1186/1755-8794-8-S3-S5 |
work_keys_str_mv | AT fertinguillaume hybriddenovotandemrepeatdetectionusingshortandlongreads AT jeangeraldine hybriddenovotandemrepeatdetectionusingshortandlongreads AT radulescuandreea hybriddenovotandemrepeatdetectionusingshortandlongreads AT rusuirena hybriddenovotandemrepeatdetectionusingshortandlongreads |