Cargando…
TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challeng...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7539535/ https://www.ncbi.nlm.nih.gov/pubmed/33034633 http://dx.doi.org/10.1093/gigascience/giaa101 |
_version_ | 1783591074222047232 |
---|---|
author | Bolognini, Davide Magi, Alberto Benes, Vladimir Korbel, Jan O Rausch, Tobias |
author_facet | Bolognini, Davide Magi, Alberto Benes, Vladimir Korbel, Jan O Rausch, Tobias |
author_sort | Bolognini, Davide |
collection | PubMed |
description | BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS: We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS: TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes. |
format | Online Article Text |
id | pubmed-7539535 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-75395352020-10-13 TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data Bolognini, Davide Magi, Alberto Benes, Vladimir Korbel, Jan O Rausch, Tobias Gigascience Technical Note BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS: We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS: TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes. Oxford University Press 2020-10-07 /pmc/articles/PMC7539535/ /pubmed/33034633 http://dx.doi.org/10.1093/gigascience/giaa101 Text en © The Author(s) 2020. Published by Oxford University Press GigaScience. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Bolognini, Davide Magi, Alberto Benes, Vladimir Korbel, Jan O Rausch, Tobias TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data |
title | TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data |
title_full | TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data |
title_fullStr | TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data |
title_full_unstemmed | TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data |
title_short | TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data |
title_sort | tricolor: tandem repeat profiling using whole-genome long-read sequencing data |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7539535/ https://www.ncbi.nlm.nih.gov/pubmed/33034633 http://dx.doi.org/10.1093/gigascience/giaa101 |
work_keys_str_mv | AT bologninidavide tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata AT magialberto tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata AT benesvladimir tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata AT korbeljano tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata AT rauschtobias tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata |