Cargando…

The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition

When examining the effects of a continuous variable x on an outcome y, a researcher might choose to dichotomize on x, dividing the population into two sets—low x and high x—and testing whether these two subpopulations differ with respect to y. Dichotomization has long been known to incur a cost in s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liben-Nowell, David, Strand, Julia, Sharp, Alexa, Wexler, Tom, Woods, Kevin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Ubiquity Press 2019
Materias:	Methods Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6634384/ https://www.ncbi.nlm.nih.gov/pubmed/31517221 http://dx.doi.org/10.5334/joc.51

_version_	1783435775704039424
author	Liben-Nowell, David Strand, Julia Sharp, Alexa Wexler, Tom Woods, Kevin
author_facet	Liben-Nowell, David Strand, Julia Sharp, Alexa Wexler, Tom Woods, Kevin
author_sort	Liben-Nowell, David
collection	PubMed
description	When examining the effects of a continuous variable x on an outcome y, a researcher might choose to dichotomize on x, dividing the population into two sets—low x and high x—and testing whether these two subpopulations differ with respect to y. Dichotomization has long been known to incur a cost in statistical power, but there remain circumstances in which it is appealing: an experimenter might use it to control for confounding covariates through subset selection, by carefully choosing a subpopulation of Low and a corresponding subpopulation of High that are balanced with respect to a list of control variables, and then comparing the subpopulations’ y values. This “divide, select, and test” approach is used in many papers throughout the psycholinguistics literature, and elsewhere. Here we show that, despite the apparent innocuousness, these methodological choices can lead to erroneous results, in two ways. First, if the balanced subsets of Low and High are selected in certain ways, it is possible to conclude a relationship between x and y not present in the full population. Specifically, we show that previously published conclusions drawn from this methodology—about the effect of a particular lexical property on spoken-word recognition—do not in fact appear to hold. Second, if the balanced subsets of Low and High are selected randomly, this methodology frequently fails to show a relationship between x and y that is present in the full population. Our work uncovers a new facet of an ongoing research effort: to identify and reveal the implicit freedoms of experimental design that can lead to false conclusions.
format	Online Article Text
id	pubmed-6634384
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Ubiquity Press
record_format	MEDLINE/PubMed
spelling	pubmed-66343842019-09-12 The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition Liben-Nowell, David Strand, Julia Sharp, Alexa Wexler, Tom Woods, Kevin J Cogn Methods Note When examining the effects of a continuous variable x on an outcome y, a researcher might choose to dichotomize on x, dividing the population into two sets—low x and high x—and testing whether these two subpopulations differ with respect to y. Dichotomization has long been known to incur a cost in statistical power, but there remain circumstances in which it is appealing: an experimenter might use it to control for confounding covariates through subset selection, by carefully choosing a subpopulation of Low and a corresponding subpopulation of High that are balanced with respect to a list of control variables, and then comparing the subpopulations’ y values. This “divide, select, and test” approach is used in many papers throughout the psycholinguistics literature, and elsewhere. Here we show that, despite the apparent innocuousness, these methodological choices can lead to erroneous results, in two ways. First, if the balanced subsets of Low and High are selected in certain ways, it is possible to conclude a relationship between x and y not present in the full population. Specifically, we show that previously published conclusions drawn from this methodology—about the effect of a particular lexical property on spoken-word recognition—do not in fact appear to hold. Second, if the balanced subsets of Low and High are selected randomly, this methodology frequently fails to show a relationship between x and y that is present in the full population. Our work uncovers a new facet of an ongoing research effort: to identify and reveal the implicit freedoms of experimental design that can lead to false conclusions. Ubiquity Press 2019-01-24 /pmc/articles/PMC6634384/ /pubmed/31517221 http://dx.doi.org/10.5334/joc.51 Text en Copyright: © 2019 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Methods Note Liben-Nowell, David Strand, Julia Sharp, Alexa Wexler, Tom Woods, Kevin The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
title	The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
title_full	The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
title_fullStr	The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
title_full_unstemmed	The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
title_short	The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
title_sort	danger of testing by selecting controlled subsets, with applications to spoken-word recognition
topic	Methods Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6634384/ https://www.ncbi.nlm.nih.gov/pubmed/31517221 http://dx.doi.org/10.5334/joc.51
work_keys_str_mv	AT libennowelldavid thedangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT strandjulia thedangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT sharpalexa thedangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT wexlertom thedangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT woodskevin thedangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT libennowelldavid dangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT strandjulia dangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT sharpalexa dangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT wexlertom dangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition AT woodskevin dangeroftestingbyselectingcontrolledsubsetswithapplicationstospokenwordrecognition

The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition

Ejemplares similares