Cargando…

CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data

Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both t...

Descripción completa

Detalles Bibliográficos
Autores principales: Mohammed, Akram, Biegert, Greyson, Adamec, Jiri, Helikar, Tomáš
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Impact Journals LLC 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788660/
https://www.ncbi.nlm.nih.gov/pubmed/29416792
http://dx.doi.org/10.18632/oncotarget.23511
_version_ 1783296118239526912
author Mohammed, Akram
Biegert, Greyson
Adamec, Jiri
Helikar, Tomáš
author_facet Mohammed, Akram
Biegert, Greyson
Adamec, Jiri
Helikar, Tomáš
author_sort Mohammed, Akram
collection PubMed
description Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and time-consuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative open-source software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.
format Online
Article
Text
id pubmed-5788660
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Impact Journals LLC
record_format MEDLINE/PubMed
spelling pubmed-57886602018-02-07 CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data Mohammed, Akram Biegert, Greyson Adamec, Jiri Helikar, Tomáš Oncotarget Research Paper Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and time-consuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative open-source software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license. Impact Journals LLC 2017-12-20 /pmc/articles/PMC5788660/ /pubmed/29416792 http://dx.doi.org/10.18632/oncotarget.23511 Text en Copyright: © 2018 Mohammed et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 (http://creativecommons.org/licenses/by/3.0/) (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Paper
Mohammed, Akram
Biegert, Greyson
Adamec, Jiri
Helikar, Tomáš
CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
title CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
title_full CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
title_fullStr CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
title_full_unstemmed CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
title_short CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
title_sort cancerdiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788660/
https://www.ncbi.nlm.nih.gov/pubmed/29416792
http://dx.doi.org/10.18632/oncotarget.23511
work_keys_str_mv AT mohammedakram cancerdiscoveranintegrativepipelineforcancerbiomarkerandcancerclasspredictionfromhighthroughputsequencingdata
AT biegertgreyson cancerdiscoveranintegrativepipelineforcancerbiomarkerandcancerclasspredictionfromhighthroughputsequencingdata
AT adamecjiri cancerdiscoveranintegrativepipelineforcancerbiomarkerandcancerclasspredictionfromhighthroughputsequencingdata
AT helikartomas cancerdiscoveranintegrativepipelineforcancerbiomarkerandcancerclasspredictionfromhighthroughputsequencingdata