Cargando…

Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation

BACKGROUND: Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheloshkina, Kseniia, Poptsova, Maria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6511154/
https://www.ncbi.nlm.nih.gov/pubmed/31077166
http://dx.doi.org/10.1186/s12885-019-5653-x
_version_ 1783417529530580992
author Cheloshkina, Kseniia
Poptsova, Maria
author_facet Cheloshkina, Kseniia
Poptsova, Maria
author_sort Cheloshkina, Kseniia
collection PubMed
description BACKGROUND: Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes being the most prevalent. We aimed at investigating the impact of specifically these two classes of non-B DNA structures on cancer breakpoint hotspots using machine learning approach. METHODS: We developed procedure for machine learning model building and evaluation as the considered data are extremely imbalanced and it was required to get a reliable estimate of the prediction power. We built logistic regression models predicting cancer breakpoint hotspots based on the densities of stem-loops and quadruplexes, jointly and separately. We also tested Random Forest models varying different resampling schemes (leave-one-out cross validation, train-test split, 3-fold cross-validation) and class balancing techniques (oversampling, stratification, synthetic minority oversampling). RESULTS: We performed analysis of 487,425 breakpoints from 2234 samples covering 10 cancer types available from the International Cancer Genome Consortium. We showed that distribution of breakpoint hotspots in different types of cancer are not correlated, confirming the heterogeneous nature of cancer. It appeared that stem-loop-based model best explains the blood, brain, liver, and prostate cancer breakpoint hotspot profiles while quadruplex-based model has higher performance for the bone, breast, ovary, pancreatic, and skin cancer. For the overall cancer profile and uterus cancer the joint model shows the highest performance. For particular datasets the constructed models reach high predictive power using just one predictor, and in the majority of the cases, the model built on both predictors does not increase the model performance. CONCLUSION: Despite the heterogeneity in breakpoint hotspots’ distribution across different cancer types, our results demonstrate an association between cancer breakpoint hotspots and stem-loops and quadruplexes. Approximately for half of the cancer types stem-loops are the most influential factors while for the others these are quadruplexes. This fact reflects the differences in regulatory potential of stem-loops and quadruplexes at the tissue-specific level, which yet to be discovered at the genome-wide scale. The performed analysis demonstrates that influence of stem-loops and quadruplexes on breakpoint hotspots formation is tissue-specific. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12885-019-5653-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6511154
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65111542019-05-20 Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation Cheloshkina, Kseniia Poptsova, Maria BMC Cancer Research Article BACKGROUND: Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes being the most prevalent. We aimed at investigating the impact of specifically these two classes of non-B DNA structures on cancer breakpoint hotspots using machine learning approach. METHODS: We developed procedure for machine learning model building and evaluation as the considered data are extremely imbalanced and it was required to get a reliable estimate of the prediction power. We built logistic regression models predicting cancer breakpoint hotspots based on the densities of stem-loops and quadruplexes, jointly and separately. We also tested Random Forest models varying different resampling schemes (leave-one-out cross validation, train-test split, 3-fold cross-validation) and class balancing techniques (oversampling, stratification, synthetic minority oversampling). RESULTS: We performed analysis of 487,425 breakpoints from 2234 samples covering 10 cancer types available from the International Cancer Genome Consortium. We showed that distribution of breakpoint hotspots in different types of cancer are not correlated, confirming the heterogeneous nature of cancer. It appeared that stem-loop-based model best explains the blood, brain, liver, and prostate cancer breakpoint hotspot profiles while quadruplex-based model has higher performance for the bone, breast, ovary, pancreatic, and skin cancer. For the overall cancer profile and uterus cancer the joint model shows the highest performance. For particular datasets the constructed models reach high predictive power using just one predictor, and in the majority of the cases, the model built on both predictors does not increase the model performance. CONCLUSION: Despite the heterogeneity in breakpoint hotspots’ distribution across different cancer types, our results demonstrate an association between cancer breakpoint hotspots and stem-loops and quadruplexes. Approximately for half of the cancer types stem-loops are the most influential factors while for the others these are quadruplexes. This fact reflects the differences in regulatory potential of stem-loops and quadruplexes at the tissue-specific level, which yet to be discovered at the genome-wide scale. The performed analysis demonstrates that influence of stem-loops and quadruplexes on breakpoint hotspots formation is tissue-specific. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12885-019-5653-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-10 /pmc/articles/PMC6511154/ /pubmed/31077166 http://dx.doi.org/10.1186/s12885-019-5653-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Cheloshkina, Kseniia
Poptsova, Maria
Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_full Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_fullStr Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_full_unstemmed Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_short Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_sort tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6511154/
https://www.ncbi.nlm.nih.gov/pubmed/31077166
http://dx.doi.org/10.1186/s12885-019-5653-x
work_keys_str_mv AT cheloshkinakseniia tissuespecificimpactofstemloopsandquadruplexesoncancerbreakpointsformation
AT poptsovamaria tissuespecificimpactofstemloopsandquadruplexesoncancerbreakpointsformation