Cargando…

Genome-wide detection of short tandem repeat expansions by long-read sequencing

BACKGROUND: Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qian, Tong, Yao, Wang, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768641/
https://www.ncbi.nlm.nih.gov/pubmed/33371889
http://dx.doi.org/10.1186/s12859-020-03876-w
_version_ 1783629199878127616
author Liu, Qian
Tong, Yao
Wang, Kai
author_facet Liu, Qian
Tong, Yao
Wang, Kai
author_sort Liu, Qian
collection PubMed
description BACKGROUND: Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. RESULTS: In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. CONCLUSIONS: In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM.
format Online
Article
Text
id pubmed-7768641
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77686412020-12-29 Genome-wide detection of short tandem repeat expansions by long-read sequencing Liu, Qian Tong, Yao Wang, Kai BMC Bioinformatics Research BACKGROUND: Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. RESULTS: In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. CONCLUSIONS: In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM. BioMed Central 2020-12-28 /pmc/articles/PMC7768641/ /pubmed/33371889 http://dx.doi.org/10.1186/s12859-020-03876-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Qian
Tong, Yao
Wang, Kai
Genome-wide detection of short tandem repeat expansions by long-read sequencing
title Genome-wide detection of short tandem repeat expansions by long-read sequencing
title_full Genome-wide detection of short tandem repeat expansions by long-read sequencing
title_fullStr Genome-wide detection of short tandem repeat expansions by long-read sequencing
title_full_unstemmed Genome-wide detection of short tandem repeat expansions by long-read sequencing
title_short Genome-wide detection of short tandem repeat expansions by long-read sequencing
title_sort genome-wide detection of short tandem repeat expansions by long-read sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768641/
https://www.ncbi.nlm.nih.gov/pubmed/33371889
http://dx.doi.org/10.1186/s12859-020-03876-w
work_keys_str_mv AT liuqian genomewidedetectionofshorttandemrepeatexpansionsbylongreadsequencing
AT tongyao genomewidedetectionofshorttandemrepeatexpansionsbylongreadsequencing
AT wangkai genomewidedetectionofshorttandemrepeatexpansionsbylongreadsequencing