Cargando…

CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance i...

Descripción completa

Detalles Bibliográficos
Autores principales: Bressler, Ryan, Kreisberg, Richard B., Bernard, Brady, Niederhuber, John E., Vockley, Joseph G., Shmulevich, Ilya, Knijnenburg, Theo A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4692062/
https://www.ncbi.nlm.nih.gov/pubmed/26679347
http://dx.doi.org/10.1371/journal.pone.0144820
_version_ 1782407229042851840
author Bressler, Ryan
Kreisberg, Richard B.
Bernard, Brady
Niederhuber, John E.
Vockley, Joseph G.
Shmulevich, Ilya
Knijnenburg, Theo A.
author_facet Bressler, Ryan
Kreisberg, Richard B.
Bernard, Brady
Niederhuber, John E.
Vockley, Joseph G.
Shmulevich, Ilya
Knijnenburg, Theo A.
author_sort Bressler, Ryan
collection PubMed
description Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.
format Online
Article
Text
id pubmed-4692062
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46920622015-12-31 CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data Bressler, Ryan Kreisberg, Richard B. Bernard, Brady Niederhuber, John E. Vockley, Joseph G. Shmulevich, Ilya Knijnenburg, Theo A. PLoS One Research Article Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest. Public Library of Science 2015-12-17 /pmc/articles/PMC4692062/ /pubmed/26679347 http://dx.doi.org/10.1371/journal.pone.0144820 Text en © 2015 Bressler et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bressler, Ryan
Kreisberg, Richard B.
Bernard, Brady
Niederhuber, John E.
Vockley, Joseph G.
Shmulevich, Ilya
Knijnenburg, Theo A.
CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
title CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
title_full CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
title_fullStr CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
title_full_unstemmed CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
title_short CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
title_sort cloudforest: a scalable and efficient random forest implementation for biological data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4692062/
https://www.ncbi.nlm.nih.gov/pubmed/26679347
http://dx.doi.org/10.1371/journal.pone.0144820
work_keys_str_mv AT bresslerryan cloudforestascalableandefficientrandomforestimplementationforbiologicaldata
AT kreisbergrichardb cloudforestascalableandefficientrandomforestimplementationforbiologicaldata
AT bernardbrady cloudforestascalableandefficientrandomforestimplementationforbiologicaldata
AT niederhuberjohne cloudforestascalableandefficientrandomforestimplementationforbiologicaldata
AT vockleyjosephg cloudforestascalableandefficientrandomforestimplementationforbiologicaldata
AT shmulevichilya cloudforestascalableandefficientrandomforestimplementationforbiologicaldata
AT knijnenburgtheoa cloudforestascalableandefficientrandomforestimplementationforbiologicaldata