Cargando…
Genome-wide detection of short tandem repeat expansions by long-read sequencing
BACKGROUND: Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768641/ https://www.ncbi.nlm.nih.gov/pubmed/33371889 http://dx.doi.org/10.1186/s12859-020-03876-w |
_version_ | 1783629199878127616 |
---|---|
author | Liu, Qian Tong, Yao Wang, Kai |
author_facet | Liu, Qian Tong, Yao Wang, Kai |
author_sort | Liu, Qian |
collection | PubMed |
description | BACKGROUND: Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. RESULTS: In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. CONCLUSIONS: In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM. |
format | Online Article Text |
id | pubmed-7768641 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77686412020-12-29 Genome-wide detection of short tandem repeat expansions by long-read sequencing Liu, Qian Tong, Yao Wang, Kai BMC Bioinformatics Research BACKGROUND: Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. RESULTS: In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. CONCLUSIONS: In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM. BioMed Central 2020-12-28 /pmc/articles/PMC7768641/ /pubmed/33371889 http://dx.doi.org/10.1186/s12859-020-03876-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liu, Qian Tong, Yao Wang, Kai Genome-wide detection of short tandem repeat expansions by long-read sequencing |
title | Genome-wide detection of short tandem repeat expansions by long-read sequencing |
title_full | Genome-wide detection of short tandem repeat expansions by long-read sequencing |
title_fullStr | Genome-wide detection of short tandem repeat expansions by long-read sequencing |
title_full_unstemmed | Genome-wide detection of short tandem repeat expansions by long-read sequencing |
title_short | Genome-wide detection of short tandem repeat expansions by long-read sequencing |
title_sort | genome-wide detection of short tandem repeat expansions by long-read sequencing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768641/ https://www.ncbi.nlm.nih.gov/pubmed/33371889 http://dx.doi.org/10.1186/s12859-020-03876-w |
work_keys_str_mv | AT liuqian genomewidedetectionofshorttandemrepeatexpansionsbylongreadsequencing AT tongyao genomewidedetectionofshorttandemrepeatexpansionsbylongreadsequencing AT wangkai genomewidedetectionofshorttandemrepeatexpansionsbylongreadsequencing |