Cargando…

Statistical methods for imbalanced data in ecological and biological studies

This book presents a fresh, new approach in that it provides a comprehensive recent review of challenging problems caused by imbalanced data in prediction and classification, and also in that it introduces several of the latest statistical methods of dealing with these problems. The book discusses t...

Descripción completa

Detalles Bibliográficos
Autores principales: Komori, Osamu, Eguchi, Shinto
Lenguaje:eng
Publicado: Springer 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1007/978-4-431-55570-4
http://cds.cern.ch/record/2685045
_version_ 1780963370887806976
author Komori, Osamu
Eguchi, Shinto
author_facet Komori, Osamu
Eguchi, Shinto
author_sort Komori, Osamu
collection CERN
description This book presents a fresh, new approach in that it provides a comprehensive recent review of challenging problems caused by imbalanced data in prediction and classification, and also in that it introduces several of the latest statistical methods of dealing with these problems. The book discusses the property of the imbalance of data from two points of view. The first is quantitative imbalance, meaning that the sample size in one population highly outnumbers that in another population. It includes presence-only data as an extreme case, where the presence of a species is confirmed, whereas the information on its absence is uncertain, which is especially common in ecology in predicting habitat distribution. The second is qualitative imbalance, meaning that the data distribution of one population can be well specified whereas that of the other one shows a highly heterogeneous property. A typical case is the existence of outliers commonly observed in gene expression data, and another is heterogeneous characteristics often observed in a case group in case-control studies. The extension of the logistic regression model, maxent, and AdaBoost for imbalanced data is discussed, providing a new framework for improvement of prediction, classification, and performance of variable selection. Weights functions introduced in the methods play an important role in alleviating the imbalance of data. This book also furnishes a new perspective on these problem and shows some applications of the recently developed statistical methods to real data sets.
id cern-2685045
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
publisher Springer
record_format invenio
spelling cern-26850452021-04-21T18:21:20Zdoi:10.1007/978-4-431-55570-4http://cds.cern.ch/record/2685045engKomori, OsamuEguchi, ShintoStatistical methods for imbalanced data in ecological and biological studiesMathematical Physics and MathematicsThis book presents a fresh, new approach in that it provides a comprehensive recent review of challenging problems caused by imbalanced data in prediction and classification, and also in that it introduces several of the latest statistical methods of dealing with these problems. The book discusses the property of the imbalance of data from two points of view. The first is quantitative imbalance, meaning that the sample size in one population highly outnumbers that in another population. It includes presence-only data as an extreme case, where the presence of a species is confirmed, whereas the information on its absence is uncertain, which is especially common in ecology in predicting habitat distribution. The second is qualitative imbalance, meaning that the data distribution of one population can be well specified whereas that of the other one shows a highly heterogeneous property. A typical case is the existence of outliers commonly observed in gene expression data, and another is heterogeneous characteristics often observed in a case group in case-control studies. The extension of the logistic regression model, maxent, and AdaBoost for imbalanced data is discussed, providing a new framework for improvement of prediction, classification, and performance of variable selection. Weights functions introduced in the methods play an important role in alleviating the imbalance of data. This book also furnishes a new perspective on these problem and shows some applications of the recently developed statistical methods to real data sets.Springeroai:cds.cern.ch:26850452019
spellingShingle Mathematical Physics and Mathematics
Komori, Osamu
Eguchi, Shinto
Statistical methods for imbalanced data in ecological and biological studies
title Statistical methods for imbalanced data in ecological and biological studies
title_full Statistical methods for imbalanced data in ecological and biological studies
title_fullStr Statistical methods for imbalanced data in ecological and biological studies
title_full_unstemmed Statistical methods for imbalanced data in ecological and biological studies
title_short Statistical methods for imbalanced data in ecological and biological studies
title_sort statistical methods for imbalanced data in ecological and biological studies
topic Mathematical Physics and Mathematics
url https://dx.doi.org/10.1007/978-4-431-55570-4
http://cds.cern.ch/record/2685045
work_keys_str_mv AT komoriosamu statisticalmethodsforimbalanceddatainecologicalandbiologicalstudies
AT eguchishinto statisticalmethodsforimbalanceddatainecologicalandbiologicalstudies