Cargando…
GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7525018/ https://www.ncbi.nlm.nih.gov/pubmed/33042959 http://dx.doi.org/10.3389/fbioe.2020.01021 |
_version_ | 1783588655103737856 |
---|---|
author | Słowiński, Piotr Li, Muzi Restrepo, Paula Alomran, Nawaf Spurr, Liam F. Miller, Christian Tsaneva-Atanasova, Krasimira Horvath, Anelia |
author_facet | Słowiński, Piotr Li, Muzi Restrepo, Paula Alomran, Nawaf Spurr, Liam F. Miller, Christian Tsaneva-Atanasova, Krasimira Horvath, Anelia |
author_sort | Słowiński, Piotr |
collection | PubMed |
description | Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele. |
format | Online Article Text |
id | pubmed-7525018 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75250182020-10-09 GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions Słowiński, Piotr Li, Muzi Restrepo, Paula Alomran, Nawaf Spurr, Liam F. Miller, Christian Tsaneva-Atanasova, Krasimira Horvath, Anelia Front Bioeng Biotechnol Bioengineering and Biotechnology Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele. Frontiers Media S.A. 2020-09-16 /pmc/articles/PMC7525018/ /pubmed/33042959 http://dx.doi.org/10.3389/fbioe.2020.01021 Text en Copyright © 2020 Słowiński, Li, Restrepo, Alomran, Spurr, Miller, Tsaneva-Atanasova and Horvath. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioengineering and Biotechnology Słowiński, Piotr Li, Muzi Restrepo, Paula Alomran, Nawaf Spurr, Liam F. Miller, Christian Tsaneva-Atanasova, Krasimira Horvath, Anelia GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_full | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_fullStr | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_full_unstemmed | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_short | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_sort | getallele: a method for analysis of dna and rna allele frequency distributions |
topic | Bioengineering and Biotechnology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7525018/ https://www.ncbi.nlm.nih.gov/pubmed/33042959 http://dx.doi.org/10.3389/fbioe.2020.01021 |
work_keys_str_mv | AT słowinskipiotr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT limuzi getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT restrepopaula getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT alomrannawaf getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT spurrliamf getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT millerchristian getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT tsanevaatanasovakrasimira getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT horvathanelia getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions |