Cargando…
CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data
Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance i...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4692062/ https://www.ncbi.nlm.nih.gov/pubmed/26679347 http://dx.doi.org/10.1371/journal.pone.0144820 |
_version_ | 1782407229042851840 |
---|---|
author | Bressler, Ryan Kreisberg, Richard B. Bernard, Brady Niederhuber, John E. Vockley, Joseph G. Shmulevich, Ilya Knijnenburg, Theo A. |
author_facet | Bressler, Ryan Kreisberg, Richard B. Bernard, Brady Niederhuber, John E. Vockley, Joseph G. Shmulevich, Ilya Knijnenburg, Theo A. |
author_sort | Bressler, Ryan |
collection | PubMed |
description | Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest. |
format | Online Article Text |
id | pubmed-4692062 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-46920622015-12-31 CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data Bressler, Ryan Kreisberg, Richard B. Bernard, Brady Niederhuber, John E. Vockley, Joseph G. Shmulevich, Ilya Knijnenburg, Theo A. PLoS One Research Article Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest. Public Library of Science 2015-12-17 /pmc/articles/PMC4692062/ /pubmed/26679347 http://dx.doi.org/10.1371/journal.pone.0144820 Text en © 2015 Bressler et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bressler, Ryan Kreisberg, Richard B. Bernard, Brady Niederhuber, John E. Vockley, Joseph G. Shmulevich, Ilya Knijnenburg, Theo A. CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data |
title | CloudForest: A Scalable and Efficient Random Forest Implementation for
Biological Data |
title_full | CloudForest: A Scalable and Efficient Random Forest Implementation for
Biological Data |
title_fullStr | CloudForest: A Scalable and Efficient Random Forest Implementation for
Biological Data |
title_full_unstemmed | CloudForest: A Scalable and Efficient Random Forest Implementation for
Biological Data |
title_short | CloudForest: A Scalable and Efficient Random Forest Implementation for
Biological Data |
title_sort | cloudforest: a scalable and efficient random forest implementation for
biological data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4692062/ https://www.ncbi.nlm.nih.gov/pubmed/26679347 http://dx.doi.org/10.1371/journal.pone.0144820 |
work_keys_str_mv | AT bresslerryan cloudforestascalableandefficientrandomforestimplementationforbiologicaldata AT kreisbergrichardb cloudforestascalableandefficientrandomforestimplementationforbiologicaldata AT bernardbrady cloudforestascalableandefficientrandomforestimplementationforbiologicaldata AT niederhuberjohne cloudforestascalableandefficientrandomforestimplementationforbiologicaldata AT vockleyjosephg cloudforestascalableandefficientrandomforestimplementationforbiologicaldata AT shmulevichilya cloudforestascalableandefficientrandomforestimplementationforbiologicaldata AT knijnenburgtheoa cloudforestascalableandefficientrandomforestimplementationforbiologicaldata |