Cargando…

GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions

Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-...

Descripción completa

Detalles Bibliográficos
Autores principales: Słowiński, Piotr, Li, Muzi, Restrepo, Paula, Alomran, Nawaf, Spurr, Liam F., Miller, Christian, Tsaneva-Atanasova, Krasimira, Horvath, Anelia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7525018/
https://www.ncbi.nlm.nih.gov/pubmed/33042959
http://dx.doi.org/10.3389/fbioe.2020.01021
_version_ 1783588655103737856
author Słowiński, Piotr
Li, Muzi
Restrepo, Paula
Alomran, Nawaf
Spurr, Liam F.
Miller, Christian
Tsaneva-Atanasova, Krasimira
Horvath, Anelia
author_facet Słowiński, Piotr
Li, Muzi
Restrepo, Paula
Alomran, Nawaf
Spurr, Liam F.
Miller, Christian
Tsaneva-Atanasova, Krasimira
Horvath, Anelia
author_sort Słowiński, Piotr
collection PubMed
description Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.
format Online
Article
Text
id pubmed-7525018
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-75250182020-10-09 GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions Słowiński, Piotr Li, Muzi Restrepo, Paula Alomran, Nawaf Spurr, Liam F. Miller, Christian Tsaneva-Atanasova, Krasimira Horvath, Anelia Front Bioeng Biotechnol Bioengineering and Biotechnology Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele. Frontiers Media S.A. 2020-09-16 /pmc/articles/PMC7525018/ /pubmed/33042959 http://dx.doi.org/10.3389/fbioe.2020.01021 Text en Copyright © 2020 Słowiński, Li, Restrepo, Alomran, Spurr, Miller, Tsaneva-Atanasova and Horvath. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Słowiński, Piotr
Li, Muzi
Restrepo, Paula
Alomran, Nawaf
Spurr, Liam F.
Miller, Christian
Tsaneva-Atanasova, Krasimira
Horvath, Anelia
GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_full GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_fullStr GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_full_unstemmed GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_short GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_sort getallele: a method for analysis of dna and rna allele frequency distributions
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7525018/
https://www.ncbi.nlm.nih.gov/pubmed/33042959
http://dx.doi.org/10.3389/fbioe.2020.01021
work_keys_str_mv AT słowinskipiotr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT limuzi getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT restrepopaula getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT alomrannawaf getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT spurrliamf getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT millerchristian getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT tsanevaatanasovakrasimira getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT horvathanelia getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions