Cargando…

Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions

BACKGROUND: Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reli...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajan-Babu, Indhu-Shree, Peng, Junran J., Chiu, Readman, Li, Chenkai, Mohajeri, Arezoo, Dolzhenko, Egor, Eberle, Michael A., Birol, Inanc, Friedman, Jan M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351082/
https://www.ncbi.nlm.nih.gov/pubmed/34372915
http://dx.doi.org/10.1186/s13073-021-00932-9
_version_ 1783735896254709760
author Rajan-Babu, Indhu-Shree
Peng, Junran J.
Chiu, Readman
Li, Chenkai
Mohajeri, Arezoo
Dolzhenko, Egor
Eberle, Michael A.
Birol, Inanc
Friedman, Jan M.
author_facet Rajan-Babu, Indhu-Shree
Peng, Junran J.
Chiu, Readman
Li, Chenkai
Mohajeri, Arezoo
Dolzhenko, Egor
Eberle, Michael A.
Birol, Inanc
Friedman, Jan M.
author_sort Rajan-Babu, Indhu-Shree
collection PubMed
description BACKGROUND: Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. METHODS: We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. RESULTS: We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. CONCLUSIONS: We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-021-00932-9.
format Online
Article
Text
id pubmed-8351082
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83510822021-08-09 Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions Rajan-Babu, Indhu-Shree Peng, Junran J. Chiu, Readman Li, Chenkai Mohajeri, Arezoo Dolzhenko, Egor Eberle, Michael A. Birol, Inanc Friedman, Jan M. Genome Med Research BACKGROUND: Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. METHODS: We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. RESULTS: We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. CONCLUSIONS: We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-021-00932-9. BioMed Central 2021-08-09 /pmc/articles/PMC8351082/ /pubmed/34372915 http://dx.doi.org/10.1186/s13073-021-00932-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Rajan-Babu, Indhu-Shree
Peng, Junran J.
Chiu, Readman
Li, Chenkai
Mohajeri, Arezoo
Dolzhenko, Egor
Eberle, Michael A.
Birol, Inanc
Friedman, Jan M.
Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
title Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
title_full Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
title_fullStr Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
title_full_unstemmed Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
title_short Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
title_sort genome-wide sequencing as a first-tier screening test for short tandem repeat expansions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351082/
https://www.ncbi.nlm.nih.gov/pubmed/34372915
http://dx.doi.org/10.1186/s13073-021-00932-9
work_keys_str_mv AT rajanbabuindhushree genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT pengjunranj genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT chiureadman genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT lichenkai genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT mohajeriarezoo genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT dolzhenkoegor genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT eberlemichaela genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT birolinanc genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions
AT friedmanjanm genomewidesequencingasafirsttierscreeningtestforshorttandemrepeatexpansions