Cargando…

Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data

High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical tes...

Descripción completa

Detalles Bibliográficos
Autores principales: Jo, Jongkwon, Jung, Seungha, Park, Joongyang, Kim, Youngsoon, Kang, Mingon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9714948/
https://www.ncbi.nlm.nih.gov/pubmed/36455001
http://dx.doi.org/10.1371/journal.pone.0278570
_version_ 1784842349730332672
author Jo, Jongkwon
Jung, Seungha
Park, Joongyang
Kim, Youngsoon
Kang, Mingon
author_facet Jo, Jongkwon
Jung, Seungha
Park, Joongyang
Kim, Youngsoon
Kang, Mingon
author_sort Jo, Jongkwon
collection PubMed
description High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO’s outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark.
format Online
Article
Text
id pubmed-9714948
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-97149482022-12-02 Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data Jo, Jongkwon Jung, Seungha Park, Joongyang Kim, Youngsoon Kang, Mingon PLoS One Research Article High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO’s outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark. Public Library of Science 2022-12-01 /pmc/articles/PMC9714948/ /pubmed/36455001 http://dx.doi.org/10.1371/journal.pone.0278570 Text en © 2022 Jo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Jo, Jongkwon
Jung, Seungha
Park, Joongyang
Kim, Youngsoon
Kang, Mingon
Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
title Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
title_full Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
title_fullStr Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
title_full_unstemmed Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
title_short Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data
title_sort hi-lasso: high-performance python and apache spark packages for feature selection with high-dimensional data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9714948/
https://www.ncbi.nlm.nih.gov/pubmed/36455001
http://dx.doi.org/10.1371/journal.pone.0278570
work_keys_str_mv AT jojongkwon hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata
AT jungseungha hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata
AT parkjoongyang hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata
AT kimyoungsoon hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata
AT kangmingon hilassohighperformancepythonandapachesparkpackagesforfeatureselectionwithhighdimensionaldata