Cargando…

bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases

Rare diseases and conditions create unique challenges for genetic epidemiologists precisely because cases and samples are scarce. In recent years, whole-genome and whole-transcriptome sequencing (WGS/WTS) have eased the study of rare genetic variants. Paired WGS and WTS data are ideal, but logistica...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yizhou Peter, Harmon, Lauren, Gardner, Eve, Ma, Xiaotu, Harsh, Josiah, Xue, Zhaoyu, Wen, Hong, Ramos, Marcel, Davis, Sean, Triche, Timothy J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516001/
https://www.ncbi.nlm.nih.gov/pubmed/37745420
http://dx.doi.org/10.1101/2023.09.15.558026
_version_ 1785109053469360128
author Huang, Yizhou Peter
Harmon, Lauren
Gardner, Eve
Ma, Xiaotu
Harsh, Josiah
Xue, Zhaoyu
Wen, Hong
Ramos, Marcel
Davis, Sean
Triche, Timothy J.
author_facet Huang, Yizhou Peter
Harmon, Lauren
Gardner, Eve
Ma, Xiaotu
Harsh, Josiah
Xue, Zhaoyu
Wen, Hong
Ramos, Marcel
Davis, Sean
Triche, Timothy J.
author_sort Huang, Yizhou Peter
collection PubMed
description Rare diseases and conditions create unique challenges for genetic epidemiologists precisely because cases and samples are scarce. In recent years, whole-genome and whole-transcriptome sequencing (WGS/WTS) have eased the study of rare genetic variants. Paired WGS and WTS data are ideal, but logistical and financial constraints often preclude generating paired WGS and WTS data. Thus, many databases contain a patchwork of specimens with either WGS or WTS data, but only a minority of samples have both. The NCI Genomic Data Commons facilitates controlled access to genomic and transcriptomic data for thousands of subjects, many with unpaired sequencing results. Local reanalysis of expressed variants across whole transcriptomes requires significant data storage, compute, and expertise. We developed the bamSliceR package to facilitate swift transition from aligned sequence reads to expressed variant characterization. bamSliceR leverages the NCI Genomic Data Commons API to query genomic sub-regions of aligned sequence reads from specimens identified through the robust Bioconductor ecosystem. We demonstrate how population-scale targeted genomic analysis can be completed using orders of magnitude fewer resources in this fashion, with minimal compute burden. We demonstrate pilot results from bamSliceR for the TARGET pediatric AML and BEAT-AML projects, where identification of rare but recurrent somatic variants directly yields biologically testable hypotheses. bamSliceR and its documentation are freely available on GitHub at https://github.com/trichelab/bamSliceR.
format Online
Article
Text
id pubmed-10516001
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105160012023-09-23 bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases Huang, Yizhou Peter Harmon, Lauren Gardner, Eve Ma, Xiaotu Harsh, Josiah Xue, Zhaoyu Wen, Hong Ramos, Marcel Davis, Sean Triche, Timothy J. bioRxiv Article Rare diseases and conditions create unique challenges for genetic epidemiologists precisely because cases and samples are scarce. In recent years, whole-genome and whole-transcriptome sequencing (WGS/WTS) have eased the study of rare genetic variants. Paired WGS and WTS data are ideal, but logistical and financial constraints often preclude generating paired WGS and WTS data. Thus, many databases contain a patchwork of specimens with either WGS or WTS data, but only a minority of samples have both. The NCI Genomic Data Commons facilitates controlled access to genomic and transcriptomic data for thousands of subjects, many with unpaired sequencing results. Local reanalysis of expressed variants across whole transcriptomes requires significant data storage, compute, and expertise. We developed the bamSliceR package to facilitate swift transition from aligned sequence reads to expressed variant characterization. bamSliceR leverages the NCI Genomic Data Commons API to query genomic sub-regions of aligned sequence reads from specimens identified through the robust Bioconductor ecosystem. We demonstrate how population-scale targeted genomic analysis can be completed using orders of magnitude fewer resources in this fashion, with minimal compute burden. We demonstrate pilot results from bamSliceR for the TARGET pediatric AML and BEAT-AML projects, where identification of rare but recurrent somatic variants directly yields biologically testable hypotheses. bamSliceR and its documentation are freely available on GitHub at https://github.com/trichelab/bamSliceR. Cold Spring Harbor Laboratory 2023-09-17 /pmc/articles/PMC10516001/ /pubmed/37745420 http://dx.doi.org/10.1101/2023.09.15.558026 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Huang, Yizhou Peter
Harmon, Lauren
Gardner, Eve
Ma, Xiaotu
Harsh, Josiah
Xue, Zhaoyu
Wen, Hong
Ramos, Marcel
Davis, Sean
Triche, Timothy J.
bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
title bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
title_full bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
title_fullStr bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
title_full_unstemmed bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
title_short bamSliceR: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
title_sort bamslicer: cross-cohort variant and allelic bias analysis for rare variants and rare diseases
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516001/
https://www.ncbi.nlm.nih.gov/pubmed/37745420
http://dx.doi.org/10.1101/2023.09.15.558026
work_keys_str_mv AT huangyizhoupeter bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT harmonlauren bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT gardnereve bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT maxiaotu bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT harshjosiah bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT xuezhaoyu bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT wenhong bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT ramosmarcel bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT davissean bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases
AT trichetimothyj bamslicercrosscohortvariantandallelicbiasanalysisforrarevariantsandrarediseases