Cargando…

On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa

This data note describes a unique two-step methodology to construct six linked datasets covering the sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa genomes. The datasets were used as evidence in a project that investigated the history of genomic science. To design the datasets,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wong, Mark, Leng, Rhodri
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000 Research Limited 2023
Materias:	Data Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871417/ https://www.ncbi.nlm.nih.gov/pubmed/33604022 http://dx.doi.org/10.12688/f1000research.18656.3

_version_	1783649005248446464
author	Wong, Mark Leng, Rhodri
author_facet	Wong, Mark Leng, Rhodri
author_sort	Wong, Mark
collection	PubMed
description	This data note describes a unique two-step methodology to construct six linked datasets covering the sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa genomes. The datasets were used as evidence in a project that investigated the history of genomic science. To design the datasets, we first retrieved all sequence submission data from the European Nucleotide Archive (ENA), including accession numbers associated with each of our three species. Second, we used these accession numbers to construct queries to retrieve peer-reviewed scientific publications that first described these sequence submissions in the scientific literature. For each species, this resulted in two associated datasets: 1) A .csv file documenting the PMID of each article describing new sequences, all paper authors, all institutional affiliations of each author, countries of institution, year of first submission to the ENA (when available), and the year of article publication, and 2) A .csv file documenting all institutions submitting to the ENA, number of nucleotides sequenced and years of submission to the database. We utilised these datasets to understand how institutional collaboration shaped sequencing efforts, and to systematically identify important institutions and changes in the structure of research communities throughout the history of genomics and across our three target species. This data note, therefore, should aid researchers who would like to use these data for future analyses by making the methodology that underpins it transparent. Further, by detailing our methodology, researchers may be able to utilise our approach to construct similar datasets in the future.
format	Online Article Text
id	pubmed-7871417
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	F1000 Research Limited
record_format	MEDLINE/PubMed
spelling	pubmed-78714172021-02-17 On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa Wong, Mark Leng, Rhodri F1000Res Data Note This data note describes a unique two-step methodology to construct six linked datasets covering the sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa genomes. The datasets were used as evidence in a project that investigated the history of genomic science. To design the datasets, we first retrieved all sequence submission data from the European Nucleotide Archive (ENA), including accession numbers associated with each of our three species. Second, we used these accession numbers to construct queries to retrieve peer-reviewed scientific publications that first described these sequence submissions in the scientific literature. For each species, this resulted in two associated datasets: 1) A .csv file documenting the PMID of each article describing new sequences, all paper authors, all institutional affiliations of each author, countries of institution, year of first submission to the ENA (when available), and the year of article publication, and 2) A .csv file documenting all institutions submitting to the ENA, number of nucleotides sequenced and years of submission to the database. We utilised these datasets to understand how institutional collaboration shaped sequencing efforts, and to systematically identify important institutions and changes in the structure of research communities throughout the history of genomics and across our three target species. This data note, therefore, should aid researchers who would like to use these data for future analyses by making the methodology that underpins it transparent. Further, by detailing our methodology, researchers may be able to utilise our approach to construct similar datasets in the future. F1000 Research Limited 2023-02-28 /pmc/articles/PMC7871417/ /pubmed/33604022 http://dx.doi.org/10.12688/f1000research.18656.3 Text en Copyright: © 2023 Wong M and Leng R https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Data Note Wong, Mark Leng, Rhodri On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa
title	On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa
title_full	On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa
title_fullStr	On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa
title_full_unstemmed	On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa
title_short	On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa
title_sort	on the design of linked datasets mapping networks of collaboration in the genomic sequencing of saccharomyces cerevisiae, homo sapiens, and sus scrofa
topic	Data Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871417/ https://www.ncbi.nlm.nih.gov/pubmed/33604022 http://dx.doi.org/10.12688/f1000research.18656.3
work_keys_str_mv	AT wongmark onthedesignoflinkeddatasetsmappingnetworksofcollaborationinthegenomicsequencingofsaccharomycescerevisiaehomosapiensandsusscrofa AT lengrhodri onthedesignoflinkeddatasetsmappingnetworksofcollaborationinthegenomicsequencingofsaccharomycescerevisiaehomosapiensandsusscrofa

On the design of linked datasets mapping networks of collaboration in the genomic sequencing of Saccharomyces cerevisiae, Homo sapiens, and Sus scrofa

Ejemplares similares