Cargando…

A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas

BACKGROUND: Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Sangdi, Wang, Chen, Zarei, Shabnam, Bell, Debra A., Kerr, Sarah E., Runger, George C., Kocher, Jean-Pierre A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258141/ https://www.ncbi.nlm.nih.gov/pubmed/30482155 http://dx.doi.org/10.1186/s12864-018-5177-9

_version_	1783374449057202176
author	Lin, Sangdi Wang, Chen Zarei, Shabnam Bell, Debra A. Kerr, Sarah E. Runger, George C. Kocher, Jean-Pierre A.
author_facet	Lin, Sangdi Wang, Chen Zarei, Shabnam Bell, Debra A. Kerr, Sarah E. Runger, George C. Kocher, Jean-Pierre A.
author_sort	Lin, Sangdi
collection	PubMed
description	BACKGROUND: Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications. Ovarian serous carcinomas can be classified into two largely mutually exclusive grades, low grade and high grade, based on their histologic features. The grade classification based on the genomics may provide valuable clue on how to best manage these patients in clinic. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades. RESULTS: We have developed a data-driven methodology for tumor classification using the profiles of CNAs reported by low-coverage sequencing. The proposed method called Bag-of-Segments is used to summarize fixed-length CNA features predictive of tumor grades. These features are further processed by machine learning techniques to obtain classification models. High accuracy is obtained for classifying ovarian serous carcinoma into high and low grades based on leave-one-out cross-validation experiments. The models that are weakly influenced by the sequence coverage and the purity of the sample can also be built, which would be of higher relevance for clinical applications. The patterns captured by Bag-of-Segments features correlate with current clinical knowledge: low grade ovarian tumors being related to aneuploidy events associated to mitotic errors while high grade ovarian tumors are induced by DNA repair gene malfunction. CONCLUSIONS: The proposed data-driven method obtains high accuracy with various parametrizations for the ovarian serous carcinoma study, indicating that it has good generalization potential towards other CNA classification problems. This method could be applied to the more difficult task of classifying ovarian serous carcinomas with ambiguous histology or in those with low grade tumor co-existing with high grade tumor. The closer genomic relationship of these tumor samples to low or high grade may provide important clinical value.
format	Online Article Text
id	pubmed-6258141
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62581412018-11-29 A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas Lin, Sangdi Wang, Chen Zarei, Shabnam Bell, Debra A. Kerr, Sarah E. Runger, George C. Kocher, Jean-Pierre A. BMC Genomics Methodology Article BACKGROUND: Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications. Ovarian serous carcinomas can be classified into two largely mutually exclusive grades, low grade and high grade, based on their histologic features. The grade classification based on the genomics may provide valuable clue on how to best manage these patients in clinic. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades. RESULTS: We have developed a data-driven methodology for tumor classification using the profiles of CNAs reported by low-coverage sequencing. The proposed method called Bag-of-Segments is used to summarize fixed-length CNA features predictive of tumor grades. These features are further processed by machine learning techniques to obtain classification models. High accuracy is obtained for classifying ovarian serous carcinoma into high and low grades based on leave-one-out cross-validation experiments. The models that are weakly influenced by the sequence coverage and the purity of the sample can also be built, which would be of higher relevance for clinical applications. The patterns captured by Bag-of-Segments features correlate with current clinical knowledge: low grade ovarian tumors being related to aneuploidy events associated to mitotic errors while high grade ovarian tumors are induced by DNA repair gene malfunction. CONCLUSIONS: The proposed data-driven method obtains high accuracy with various parametrizations for the ovarian serous carcinoma study, indicating that it has good generalization potential towards other CNA classification problems. This method could be applied to the more difficult task of classifying ovarian serous carcinomas with ambiguous histology or in those with low grade tumor co-existing with high grade tumor. The closer genomic relationship of these tumor samples to low or high grade may provide important clinical value. BioMed Central 2018-11-27 /pmc/articles/PMC6258141/ /pubmed/30482155 http://dx.doi.org/10.1186/s12864-018-5177-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Lin, Sangdi Wang, Chen Zarei, Shabnam Bell, Debra A. Kerr, Sarah E. Runger, George C. Kocher, Jean-Pierre A. A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
title	A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
title_full	A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
title_fullStr	A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
title_full_unstemmed	A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
title_short	A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
title_sort	data science approach for the classification of low-grade and high-grade ovarian serous carcinomas
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258141/ https://www.ncbi.nlm.nih.gov/pubmed/30482155 http://dx.doi.org/10.1186/s12864-018-5177-9
work_keys_str_mv	AT linsangdi adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT wangchen adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT zareishabnam adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT belldebraa adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT kerrsarahe adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT rungergeorgec adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT kocherjeanpierrea adatascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT linsangdi datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT wangchen datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT zareishabnam datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT belldebraa datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT kerrsarahe datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT rungergeorgec datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas AT kocherjeanpierrea datascienceapproachfortheclassificationoflowgradeandhighgradeovarianserouscarcinomas

A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas

Ejemplares similares