Cargando…
Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
BACKGROUND: Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6511154/ https://www.ncbi.nlm.nih.gov/pubmed/31077166 http://dx.doi.org/10.1186/s12885-019-5653-x |
_version_ | 1783417529530580992 |
---|---|
author | Cheloshkina, Kseniia Poptsova, Maria |
author_facet | Cheloshkina, Kseniia Poptsova, Maria |
author_sort | Cheloshkina, Kseniia |
collection | PubMed |
description | BACKGROUND: Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes being the most prevalent. We aimed at investigating the impact of specifically these two classes of non-B DNA structures on cancer breakpoint hotspots using machine learning approach. METHODS: We developed procedure for machine learning model building and evaluation as the considered data are extremely imbalanced and it was required to get a reliable estimate of the prediction power. We built logistic regression models predicting cancer breakpoint hotspots based on the densities of stem-loops and quadruplexes, jointly and separately. We also tested Random Forest models varying different resampling schemes (leave-one-out cross validation, train-test split, 3-fold cross-validation) and class balancing techniques (oversampling, stratification, synthetic minority oversampling). RESULTS: We performed analysis of 487,425 breakpoints from 2234 samples covering 10 cancer types available from the International Cancer Genome Consortium. We showed that distribution of breakpoint hotspots in different types of cancer are not correlated, confirming the heterogeneous nature of cancer. It appeared that stem-loop-based model best explains the blood, brain, liver, and prostate cancer breakpoint hotspot profiles while quadruplex-based model has higher performance for the bone, breast, ovary, pancreatic, and skin cancer. For the overall cancer profile and uterus cancer the joint model shows the highest performance. For particular datasets the constructed models reach high predictive power using just one predictor, and in the majority of the cases, the model built on both predictors does not increase the model performance. CONCLUSION: Despite the heterogeneity in breakpoint hotspots’ distribution across different cancer types, our results demonstrate an association between cancer breakpoint hotspots and stem-loops and quadruplexes. Approximately for half of the cancer types stem-loops are the most influential factors while for the others these are quadruplexes. This fact reflects the differences in regulatory potential of stem-loops and quadruplexes at the tissue-specific level, which yet to be discovered at the genome-wide scale. The performed analysis demonstrates that influence of stem-loops and quadruplexes on breakpoint hotspots formation is tissue-specific. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12885-019-5653-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6511154 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65111542019-05-20 Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation Cheloshkina, Kseniia Poptsova, Maria BMC Cancer Research Article BACKGROUND: Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes being the most prevalent. We aimed at investigating the impact of specifically these two classes of non-B DNA structures on cancer breakpoint hotspots using machine learning approach. METHODS: We developed procedure for machine learning model building and evaluation as the considered data are extremely imbalanced and it was required to get a reliable estimate of the prediction power. We built logistic regression models predicting cancer breakpoint hotspots based on the densities of stem-loops and quadruplexes, jointly and separately. We also tested Random Forest models varying different resampling schemes (leave-one-out cross validation, train-test split, 3-fold cross-validation) and class balancing techniques (oversampling, stratification, synthetic minority oversampling). RESULTS: We performed analysis of 487,425 breakpoints from 2234 samples covering 10 cancer types available from the International Cancer Genome Consortium. We showed that distribution of breakpoint hotspots in different types of cancer are not correlated, confirming the heterogeneous nature of cancer. It appeared that stem-loop-based model best explains the blood, brain, liver, and prostate cancer breakpoint hotspot profiles while quadruplex-based model has higher performance for the bone, breast, ovary, pancreatic, and skin cancer. For the overall cancer profile and uterus cancer the joint model shows the highest performance. For particular datasets the constructed models reach high predictive power using just one predictor, and in the majority of the cases, the model built on both predictors does not increase the model performance. CONCLUSION: Despite the heterogeneity in breakpoint hotspots’ distribution across different cancer types, our results demonstrate an association between cancer breakpoint hotspots and stem-loops and quadruplexes. Approximately for half of the cancer types stem-loops are the most influential factors while for the others these are quadruplexes. This fact reflects the differences in regulatory potential of stem-loops and quadruplexes at the tissue-specific level, which yet to be discovered at the genome-wide scale. The performed analysis demonstrates that influence of stem-loops and quadruplexes on breakpoint hotspots formation is tissue-specific. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12885-019-5653-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-10 /pmc/articles/PMC6511154/ /pubmed/31077166 http://dx.doi.org/10.1186/s12885-019-5653-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Cheloshkina, Kseniia Poptsova, Maria Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
title | Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
title_full | Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
title_fullStr | Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
title_full_unstemmed | Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
title_short | Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
title_sort | tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6511154/ https://www.ncbi.nlm.nih.gov/pubmed/31077166 http://dx.doi.org/10.1186/s12885-019-5653-x |
work_keys_str_mv | AT cheloshkinakseniia tissuespecificimpactofstemloopsandquadruplexesoncancerbreakpointsformation AT poptsovamaria tissuespecificimpactofstemloopsandquadruplexesoncancerbreakpointsformation |