Cargando…
Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the gen...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673627/ https://www.ncbi.nlm.nih.gov/pubmed/29100084 http://dx.doi.org/10.1016/j.ajhg.2017.09.013 |
_version_ | 1783276608635797504 |
---|---|
author | Tang, Haibao Kirkness, Ewen F. Lippert, Christoph Biggs, William H. Fabani, Martin Guzman, Ernesto Ramakrishnan, Smriti Lavrenko, Victor Kakaradov, Boyko Hou, Claire Hicks, Barry Heckerman, David Och, Franz J. Caskey, C. Thomas Venter, J. Craig Telenti, Amalio |
author_facet | Tang, Haibao Kirkness, Ewen F. Lippert, Christoph Biggs, William H. Fabani, Martin Guzman, Ernesto Ramakrishnan, Smriti Lavrenko, Victor Kakaradov, Boyko Hou, Claire Hicks, Barry Heckerman, David Och, Franz J. Caskey, C. Thomas Venter, J. Craig Telenti, Amalio |
author_sort | Tang, Haibao |
collection | PubMed |
description | Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases. |
format | Online Article Text |
id | pubmed-5673627 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-56736272018-05-02 Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes Tang, Haibao Kirkness, Ewen F. Lippert, Christoph Biggs, William H. Fabani, Martin Guzman, Ernesto Ramakrishnan, Smriti Lavrenko, Victor Kakaradov, Boyko Hou, Claire Hicks, Barry Heckerman, David Och, Franz J. Caskey, C. Thomas Venter, J. Craig Telenti, Amalio Am J Hum Genet Article Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases. Elsevier 2017-11-02 2017-11-02 /pmc/articles/PMC5673627/ /pubmed/29100084 http://dx.doi.org/10.1016/j.ajhg.2017.09.013 Text en © 2017 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Tang, Haibao Kirkness, Ewen F. Lippert, Christoph Biggs, William H. Fabani, Martin Guzman, Ernesto Ramakrishnan, Smriti Lavrenko, Victor Kakaradov, Boyko Hou, Claire Hicks, Barry Heckerman, David Och, Franz J. Caskey, C. Thomas Venter, J. Craig Telenti, Amalio Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes |
title | Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes |
title_full | Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes |
title_fullStr | Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes |
title_full_unstemmed | Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes |
title_short | Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes |
title_sort | profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673627/ https://www.ncbi.nlm.nih.gov/pubmed/29100084 http://dx.doi.org/10.1016/j.ajhg.2017.09.013 |
work_keys_str_mv | AT tanghaibao profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT kirknessewenf profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT lippertchristoph profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT biggswilliamh profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT fabanimartin profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT guzmanernesto profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT ramakrishnansmriti profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT lavrenkovictor profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT kakaradovboyko profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT houclaire profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT hicksbarry profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT heckermandavid profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT ochfranzj profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT caskeycthomas profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT venterjcraig profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes AT telentiamalio profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes |