Cargando…

Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes

Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Haibao, Kirkness, Ewen F., Lippert, Christoph, Biggs, William H., Fabani, Martin, Guzman, Ernesto, Ramakrishnan, Smriti, Lavrenko, Victor, Kakaradov, Boyko, Hou, Claire, Hicks, Barry, Heckerman, David, Och, Franz J., Caskey, C. Thomas, Venter, J. Craig, Telenti, Amalio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673627/
https://www.ncbi.nlm.nih.gov/pubmed/29100084
http://dx.doi.org/10.1016/j.ajhg.2017.09.013
_version_ 1783276608635797504
author Tang, Haibao
Kirkness, Ewen F.
Lippert, Christoph
Biggs, William H.
Fabani, Martin
Guzman, Ernesto
Ramakrishnan, Smriti
Lavrenko, Victor
Kakaradov, Boyko
Hou, Claire
Hicks, Barry
Heckerman, David
Och, Franz J.
Caskey, C. Thomas
Venter, J. Craig
Telenti, Amalio
author_facet Tang, Haibao
Kirkness, Ewen F.
Lippert, Christoph
Biggs, William H.
Fabani, Martin
Guzman, Ernesto
Ramakrishnan, Smriti
Lavrenko, Victor
Kakaradov, Boyko
Hou, Claire
Hicks, Barry
Heckerman, David
Och, Franz J.
Caskey, C. Thomas
Venter, J. Craig
Telenti, Amalio
author_sort Tang, Haibao
collection PubMed
description Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.
format Online
Article
Text
id pubmed-5673627
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-56736272018-05-02 Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes Tang, Haibao Kirkness, Ewen F. Lippert, Christoph Biggs, William H. Fabani, Martin Guzman, Ernesto Ramakrishnan, Smriti Lavrenko, Victor Kakaradov, Boyko Hou, Claire Hicks, Barry Heckerman, David Och, Franz J. Caskey, C. Thomas Venter, J. Craig Telenti, Amalio Am J Hum Genet Article Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases. Elsevier 2017-11-02 2017-11-02 /pmc/articles/PMC5673627/ /pubmed/29100084 http://dx.doi.org/10.1016/j.ajhg.2017.09.013 Text en © 2017 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Tang, Haibao
Kirkness, Ewen F.
Lippert, Christoph
Biggs, William H.
Fabani, Martin
Guzman, Ernesto
Ramakrishnan, Smriti
Lavrenko, Victor
Kakaradov, Boyko
Hou, Claire
Hicks, Barry
Heckerman, David
Och, Franz J.
Caskey, C. Thomas
Venter, J. Craig
Telenti, Amalio
Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
title Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
title_full Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
title_fullStr Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
title_full_unstemmed Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
title_short Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
title_sort profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673627/
https://www.ncbi.nlm.nih.gov/pubmed/29100084
http://dx.doi.org/10.1016/j.ajhg.2017.09.013
work_keys_str_mv AT tanghaibao profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT kirknessewenf profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT lippertchristoph profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT biggswilliamh profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT fabanimartin profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT guzmanernesto profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT ramakrishnansmriti profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT lavrenkovictor profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT kakaradovboyko profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT houclaire profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT hicksbarry profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT heckermandavid profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT ochfranzj profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT caskeycthomas profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT venterjcraig profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes
AT telentiamalio profilingofshorttandemrepeatdiseaseallelesin12632humanwholegenomes