Cargando…

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data

BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challeng...

Descripción completa

Detalles Bibliográficos
Autores principales: Bolognini, Davide, Magi, Alberto, Benes, Vladimir, Korbel, Jan O, Rausch, Tobias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7539535/
https://www.ncbi.nlm.nih.gov/pubmed/33034633
http://dx.doi.org/10.1093/gigascience/giaa101
_version_ 1783591074222047232
author Bolognini, Davide
Magi, Alberto
Benes, Vladimir
Korbel, Jan O
Rausch, Tobias
author_facet Bolognini, Davide
Magi, Alberto
Benes, Vladimir
Korbel, Jan O
Rausch, Tobias
author_sort Bolognini, Davide
collection PubMed
description BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS: We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS: TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
format Online
Article
Text
id pubmed-7539535
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-75395352020-10-13 TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data Bolognini, Davide Magi, Alberto Benes, Vladimir Korbel, Jan O Rausch, Tobias Gigascience Technical Note BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS: We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS: TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes. Oxford University Press 2020-10-07 /pmc/articles/PMC7539535/ /pubmed/33034633 http://dx.doi.org/10.1093/gigascience/giaa101 Text en © The Author(s) 2020. Published by Oxford University Press GigaScience. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Bolognini, Davide
Magi, Alberto
Benes, Vladimir
Korbel, Jan O
Rausch, Tobias
TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
title TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
title_full TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
title_fullStr TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
title_full_unstemmed TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
title_short TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
title_sort tricolor: tandem repeat profiling using whole-genome long-read sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7539535/
https://www.ncbi.nlm.nih.gov/pubmed/33034633
http://dx.doi.org/10.1093/gigascience/giaa101
work_keys_str_mv AT bologninidavide tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata
AT magialberto tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata
AT benesvladimir tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata
AT korbeljano tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata
AT rauschtobias tricolortandemrepeatprofilingusingwholegenomelongreadsequencingdata