Cargando…

High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer

The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal...

Descripción completa

Detalles Bibliográficos
Autores principales: Long, Nguyen Phuoc, Park, Seongoh, Anh, Nguyen Hoang, Nghi, Tran Diem, Yoon, Sang Jun, Park, Jeong Hill, Lim, Johan, Kwon, Sung Won
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6358915/
https://www.ncbi.nlm.nih.gov/pubmed/30642095
http://dx.doi.org/10.3390/ijms20020296
_version_ 1783392100833820672
author Long, Nguyen Phuoc
Park, Seongoh
Anh, Nguyen Hoang
Nghi, Tran Diem
Yoon, Sang Jun
Park, Jeong Hill
Lim, Johan
Kwon, Sung Won
author_facet Long, Nguyen Phuoc
Park, Seongoh
Anh, Nguyen Hoang
Nghi, Tran Diem
Yoon, Sang Jun
Park, Jeong Hill
Lim, Johan
Kwon, Sung Won
author_sort Long, Nguyen Phuoc
collection PubMed
description The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in Helicobacter pylori infection and inflammatory processes. The overexpression of TGFBI and S100A2 was associated with poor disease-free survival while the down-regulation of NR5A2, SLC4A4, and CD177 was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings.
format Online
Article
Text
id pubmed-6358915
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-63589152019-02-06 High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer Long, Nguyen Phuoc Park, Seongoh Anh, Nguyen Hoang Nghi, Tran Diem Yoon, Sang Jun Park, Jeong Hill Lim, Johan Kwon, Sung Won Int J Mol Sci Article The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in Helicobacter pylori infection and inflammatory processes. The overexpression of TGFBI and S100A2 was associated with poor disease-free survival while the down-regulation of NR5A2, SLC4A4, and CD177 was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings. MDPI 2019-01-12 /pmc/articles/PMC6358915/ /pubmed/30642095 http://dx.doi.org/10.3390/ijms20020296 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Long, Nguyen Phuoc
Park, Seongoh
Anh, Nguyen Hoang
Nghi, Tran Diem
Yoon, Sang Jun
Park, Jeong Hill
Lim, Johan
Kwon, Sung Won
High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_full High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_fullStr High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_full_unstemmed High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_short High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_sort high-throughput omics and statistical learning integration for the discovery and validation of novel diagnostic signatures in colorectal cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6358915/
https://www.ncbi.nlm.nih.gov/pubmed/30642095
http://dx.doi.org/10.3390/ijms20020296
work_keys_str_mv AT longnguyenphuoc highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT parkseongoh highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT anhnguyenhoang highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT nghitrandiem highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT yoonsangjun highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT parkjeonghill highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT limjohan highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT kwonsungwon highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer