Cargando…
Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical tes...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9714948/ https://www.ncbi.nlm.nih.gov/pubmed/36455001 http://dx.doi.org/10.1371/journal.pone.0278570 |
_version_ | 1784842349730332672 |
---|---|
author | Jo, Jongkwon Jung, Seungha Park, Joongyang Kim, Youngsoon Kang, Mingon |
author_facet | Jo, Jongkwon Jung, Seungha Park, Joongyang Kim, Youngsoon Kang, Mingon |
author_sort | Jo, Jongkwon |
collection | PubMed |
description | High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO’s outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark. |
format | Online Article Text |
id | pubmed-9714948 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-97149482022-12-02 Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data Jo, Jongkwon Jung, Seungha Park, Joongyang Kim, Youngsoon Kang, Mingon PLoS One Research Article High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO’s outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark. Public Library of Science 2022-12-01 /pmc/articles/PMC9714948/ /pubmed/36455001 http://dx.doi.org/10.1371/journal.pone.0278570 Text en © 2022 Jo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Jo, Jongkwon Jung, Seungha Park, Joongyang Kim, Youngsoon Kang, Mingon Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data |
title | Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data |
title_full | Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data |
title_fullStr | Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data |
title_full_unstemmed | Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data |
title_short | Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data |
title_sort | hi-lasso: high-performance python and apache spark packages for feature selection with high-dimensional data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9714948/ https://www.ncbi.nlm.nih.gov/pubmed/36455001 http://dx.doi.org/10.1371/journal.pone.0278570 |
work_keys_str_mv | AT jojongkwon hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata AT jungseungha hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata AT parkjoongyang hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata AT kimyoungsoon hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata AT kangmingon hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata |