Cargando…

Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation

Scholars have noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these te...

Descripción completa

Detalles Bibliográficos
Autores principales: Hernandez, Margarita, Shenk, Mary K., Perry, George H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7540799/
https://www.ncbi.nlm.nih.gov/pubmed/33047065
http://dx.doi.org/10.1098/rsos.201206
_version_ 1783591279098068992
author Hernandez, Margarita
Shenk, Mary K.
Perry, George H.
author_facet Hernandez, Margarita
Shenk, Mary K.
Perry, George H.
author_sort Hernandez, Margarita
collection PubMed
description Scholars have noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these technologies continue these disparities? Here, using non-human primates as a case study, we identified disparities in massively parallel genomic sequencing data and conducted interviews with scientists who produced these data to learn their motivations when selecting study species. We tested whether variables including publication history and conservation status were significantly correlated with publicly available sequence data in the NCBI Sequence Read Archive (SRA). Of the 179.6 terabases (Tb) of sequence data in SRA for 519 non-human primate species, 135 Tb (approx. 75%) were from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees and crab-eating macaques. The strongest predictors of the amount of genomic data were the total number of non-medical publications (linear regression; r(2) = 0.37; p = 6.15 × 10(−12)) and number of medical publications (r(2) = 0.27; p = 9.27 × 10(−9)). In a generalized linear model, the number of non-medical publications (p = 0.00064) and closer phylogenetic distance to humans (p = 0.024) were the most predictive of the amount of genomic sequence data. We interviewed 33 authors of genomic data-producing publications and analysed their responses using grounded theory. Consistent with our quantitative results, authors mentioned their choice of species was motivated by sample accessibility, prior published work and relevance to human medicine. Our mixed-methods approach helped identify and contextualize some of the driving factors behind species-uneven patterns of scientific research, which can now be considered by funding agencies, scientific societies and research teams aiming to align their broader goals with future data generation efforts.
format Online
Article
Text
id pubmed-7540799
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-75407992020-10-11 Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation Hernandez, Margarita Shenk, Mary K. Perry, George H. R Soc Open Sci Genetics and Genomics Scholars have noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these technologies continue these disparities? Here, using non-human primates as a case study, we identified disparities in massively parallel genomic sequencing data and conducted interviews with scientists who produced these data to learn their motivations when selecting study species. We tested whether variables including publication history and conservation status were significantly correlated with publicly available sequence data in the NCBI Sequence Read Archive (SRA). Of the 179.6 terabases (Tb) of sequence data in SRA for 519 non-human primate species, 135 Tb (approx. 75%) were from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees and crab-eating macaques. The strongest predictors of the amount of genomic data were the total number of non-medical publications (linear regression; r(2) = 0.37; p = 6.15 × 10(−12)) and number of medical publications (r(2) = 0.27; p = 9.27 × 10(−9)). In a generalized linear model, the number of non-medical publications (p = 0.00064) and closer phylogenetic distance to humans (p = 0.024) were the most predictive of the amount of genomic sequence data. We interviewed 33 authors of genomic data-producing publications and analysed their responses using grounded theory. Consistent with our quantitative results, authors mentioned their choice of species was motivated by sample accessibility, prior published work and relevance to human medicine. Our mixed-methods approach helped identify and contextualize some of the driving factors behind species-uneven patterns of scientific research, which can now be considered by funding agencies, scientific societies and research teams aiming to align their broader goals with future data generation efforts. The Royal Society 2020-09-30 /pmc/articles/PMC7540799/ /pubmed/33047065 http://dx.doi.org/10.1098/rsos.201206 Text en © 2020 The Authors. http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/http://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Genetics and Genomics
Hernandez, Margarita
Shenk, Mary K.
Perry, George H.
Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
title Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
title_full Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
title_fullStr Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
title_full_unstemmed Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
title_short Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
title_sort factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
topic Genetics and Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7540799/
https://www.ncbi.nlm.nih.gov/pubmed/33047065
http://dx.doi.org/10.1098/rsos.201206
work_keys_str_mv AT hernandezmargarita factorsinfluencingtaxonomicunevennessinscientificresearchamixedmethodscasestudyofnonhumanprimategenomicsequencedatageneration
AT shenkmaryk factorsinfluencingtaxonomicunevennessinscientificresearchamixedmethodscasestudyofnonhumanprimategenomicsequencedatageneration
AT perrygeorgeh factorsinfluencingtaxonomicunevennessinscientificresearchamixedmethodscasestudyofnonhumanprimategenomicsequencedatageneration