Cargando…

Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature

BACKGROUND: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, i...

Descripción completa

Detalles Bibliográficos
Autores principales: Lange, Matthias, Alako, Blaise T F, Cochrane, Guy, Ghaffar, Mehmood, Mascher, Martin, Habekost, Pia-Katharina, Hillebrand, Upneet, Scholz, Uwe, Schorch, Florian, Freitag, Jens, Scholz, Amber Hartman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716361/
https://www.ncbi.nlm.nih.gov/pubmed/34966925
http://dx.doi.org/10.1093/gigascience/giab084
_version_ 1784624306746032128
author Lange, Matthias
Alako, Blaise T F
Cochrane, Guy
Ghaffar, Mehmood
Mascher, Martin
Habekost, Pia-Katharina
Hillebrand, Upneet
Scholz, Uwe
Schorch, Florian
Freitag, Jens
Scholz, Amber Hartman
author_facet Lange, Matthias
Alako, Blaise T F
Cochrane, Guy
Ghaffar, Mehmood
Mascher, Martin
Habekost, Pia-Katharina
Hillebrand, Upneet
Scholz, Uwe
Schorch, Florian
Freitag, Jens
Scholz, Amber Hartman
author_sort Lange, Matthias
collection PubMed
description BACKGROUND: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. FINDINGS: We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. CONCLUSIONS: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.
format Online
Article
Text
id pubmed-8716361
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87163612022-01-05 Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature Lange, Matthias Alako, Blaise T F Cochrane, Guy Ghaffar, Mehmood Mascher, Martin Habekost, Pia-Katharina Hillebrand, Upneet Scholz, Uwe Schorch, Florian Freitag, Jens Scholz, Amber Hartman Gigascience Data Note BACKGROUND: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. FINDINGS: We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. CONCLUSIONS: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity. Oxford University Press 2021-12-29 /pmc/articles/PMC8716361/ /pubmed/34966925 http://dx.doi.org/10.1093/gigascience/giab084 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Lange, Matthias
Alako, Blaise T F
Cochrane, Guy
Ghaffar, Mehmood
Mascher, Martin
Habekost, Pia-Katharina
Hillebrand, Upneet
Scholz, Uwe
Schorch, Florian
Freitag, Jens
Scholz, Amber Hartman
Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
title Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
title_full Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
title_fullStr Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
title_full_unstemmed Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
title_short Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
title_sort quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716361/
https://www.ncbi.nlm.nih.gov/pubmed/34966925
http://dx.doi.org/10.1093/gigascience/giab084
work_keys_str_mv AT langematthias quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT alakoblaisetf quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT cochraneguy quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT ghaffarmehmood quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT maschermartin quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT habekostpiakatharina quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT hillebrandupneet quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT scholzuwe quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT schorchflorian quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT freitagjens quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature
AT scholzamberhartman quantitativemonitoringofnucleotidesequencedatafromgeneticresourcesincontextoftheircitationinthescientificliterature