Cargando…

Standardized Metadata for Human Pathogen/Vector Genomic Sequences

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulen...

Descripción completa

Detalles Bibliográficos
Autores principales: Dugan, Vivien G., Emrich, Scott J., Giraldo-Calderón, Gloria I., Harb, Omar S., Newman, Ruchi M., Pickett, Brett E., Schriml, Lynn M., Stockwell, Timothy B., Stoeckert, Christian J., Sullivan, Dan E., Singh, Indresh, Ward, Doyle V., Yao, Alison, Zheng, Jie, Barrett, Tanya, Birren, Bruce, Brinkac, Lauren, Bruno, Vincent M., Caler, Elizabet, Chapman, Sinéad, Collins, Frank H., Cuomo, Christina A., Di Francesco, Valentina, Durkin, Scott, Eppinger, Mark, Feldgarden, Michael, Fraser, Claire, Fricke, W. Florian, Giovanni, Maria, Henn, Matthew R., Hine, Erin, Hotopp, Julie Dunning, Karsch-Mizrachi, Ilene, Kissinger, Jessica C., Lee, Eun Mi, Mathur, Punam, Mongodin, Emmanuel F., Murphy, Cheryl I., Myers, Garry, Neafsey, Daniel E., Nelson, Karen E., Nierman, William C., Puzak, Julia, Rasko, David, Roos, David S., Sadzewicz, Lisa, Silva, Joana C., Sobral, Bruno, Squires, R. Burke, Stevens, Rick L., Tallon, Luke, Tettelin, Herve, Wentworth, David, White, Owen, Will, Rebecca, Wortman, Jennifer, Zhang, Yun, Scheuermann, Richard H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061050/
https://www.ncbi.nlm.nih.gov/pubmed/24936976
http://dx.doi.org/10.1371/journal.pone.0099979
_version_ 1782321439576162304
author Dugan, Vivien G.
Emrich, Scott J.
Giraldo-Calderón, Gloria I.
Harb, Omar S.
Newman, Ruchi M.
Pickett, Brett E.
Schriml, Lynn M.
Stockwell, Timothy B.
Stoeckert, Christian J.
Sullivan, Dan E.
Singh, Indresh
Ward, Doyle V.
Yao, Alison
Zheng, Jie
Barrett, Tanya
Birren, Bruce
Brinkac, Lauren
Bruno, Vincent M.
Caler, Elizabet
Chapman, Sinéad
Collins, Frank H.
Cuomo, Christina A.
Di Francesco, Valentina
Durkin, Scott
Eppinger, Mark
Feldgarden, Michael
Fraser, Claire
Fricke, W. Florian
Giovanni, Maria
Henn, Matthew R.
Hine, Erin
Hotopp, Julie Dunning
Karsch-Mizrachi, Ilene
Kissinger, Jessica C.
Lee, Eun Mi
Mathur, Punam
Mongodin, Emmanuel F.
Murphy, Cheryl I.
Myers, Garry
Neafsey, Daniel E.
Nelson, Karen E.
Nierman, William C.
Puzak, Julia
Rasko, David
Roos, David S.
Sadzewicz, Lisa
Silva, Joana C.
Sobral, Bruno
Squires, R. Burke
Stevens, Rick L.
Tallon, Luke
Tettelin, Herve
Wentworth, David
White, Owen
Will, Rebecca
Wortman, Jennifer
Zhang, Yun
Scheuermann, Richard H.
author_facet Dugan, Vivien G.
Emrich, Scott J.
Giraldo-Calderón, Gloria I.
Harb, Omar S.
Newman, Ruchi M.
Pickett, Brett E.
Schriml, Lynn M.
Stockwell, Timothy B.
Stoeckert, Christian J.
Sullivan, Dan E.
Singh, Indresh
Ward, Doyle V.
Yao, Alison
Zheng, Jie
Barrett, Tanya
Birren, Bruce
Brinkac, Lauren
Bruno, Vincent M.
Caler, Elizabet
Chapman, Sinéad
Collins, Frank H.
Cuomo, Christina A.
Di Francesco, Valentina
Durkin, Scott
Eppinger, Mark
Feldgarden, Michael
Fraser, Claire
Fricke, W. Florian
Giovanni, Maria
Henn, Matthew R.
Hine, Erin
Hotopp, Julie Dunning
Karsch-Mizrachi, Ilene
Kissinger, Jessica C.
Lee, Eun Mi
Mathur, Punam
Mongodin, Emmanuel F.
Murphy, Cheryl I.
Myers, Garry
Neafsey, Daniel E.
Nelson, Karen E.
Nierman, William C.
Puzak, Julia
Rasko, David
Roos, David S.
Sadzewicz, Lisa
Silva, Joana C.
Sobral, Bruno
Squires, R. Burke
Stevens, Rick L.
Tallon, Luke
Tettelin, Herve
Wentworth, David
White, Owen
Will, Rebecca
Wortman, Jennifer
Zhang, Yun
Scheuermann, Richard H.
author_sort Dugan, Vivien G.
collection PubMed
description High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
format Online
Article
Text
id pubmed-4061050
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40610502014-06-20 Standardized Metadata for Human Pathogen/Vector Genomic Sequences Dugan, Vivien G. Emrich, Scott J. Giraldo-Calderón, Gloria I. Harb, Omar S. Newman, Ruchi M. Pickett, Brett E. Schriml, Lynn M. Stockwell, Timothy B. Stoeckert, Christian J. Sullivan, Dan E. Singh, Indresh Ward, Doyle V. Yao, Alison Zheng, Jie Barrett, Tanya Birren, Bruce Brinkac, Lauren Bruno, Vincent M. Caler, Elizabet Chapman, Sinéad Collins, Frank H. Cuomo, Christina A. Di Francesco, Valentina Durkin, Scott Eppinger, Mark Feldgarden, Michael Fraser, Claire Fricke, W. Florian Giovanni, Maria Henn, Matthew R. Hine, Erin Hotopp, Julie Dunning Karsch-Mizrachi, Ilene Kissinger, Jessica C. Lee, Eun Mi Mathur, Punam Mongodin, Emmanuel F. Murphy, Cheryl I. Myers, Garry Neafsey, Daniel E. Nelson, Karen E. Nierman, William C. Puzak, Julia Rasko, David Roos, David S. Sadzewicz, Lisa Silva, Joana C. Sobral, Bruno Squires, R. Burke Stevens, Rick L. Tallon, Luke Tettelin, Herve Wentworth, David White, Owen Will, Rebecca Wortman, Jennifer Zhang, Yun Scheuermann, Richard H. PLoS One Research Article High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. Public Library of Science 2014-06-17 /pmc/articles/PMC4061050/ /pubmed/24936976 http://dx.doi.org/10.1371/journal.pone.0099979 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Dugan, Vivien G.
Emrich, Scott J.
Giraldo-Calderón, Gloria I.
Harb, Omar S.
Newman, Ruchi M.
Pickett, Brett E.
Schriml, Lynn M.
Stockwell, Timothy B.
Stoeckert, Christian J.
Sullivan, Dan E.
Singh, Indresh
Ward, Doyle V.
Yao, Alison
Zheng, Jie
Barrett, Tanya
Birren, Bruce
Brinkac, Lauren
Bruno, Vincent M.
Caler, Elizabet
Chapman, Sinéad
Collins, Frank H.
Cuomo, Christina A.
Di Francesco, Valentina
Durkin, Scott
Eppinger, Mark
Feldgarden, Michael
Fraser, Claire
Fricke, W. Florian
Giovanni, Maria
Henn, Matthew R.
Hine, Erin
Hotopp, Julie Dunning
Karsch-Mizrachi, Ilene
Kissinger, Jessica C.
Lee, Eun Mi
Mathur, Punam
Mongodin, Emmanuel F.
Murphy, Cheryl I.
Myers, Garry
Neafsey, Daniel E.
Nelson, Karen E.
Nierman, William C.
Puzak, Julia
Rasko, David
Roos, David S.
Sadzewicz, Lisa
Silva, Joana C.
Sobral, Bruno
Squires, R. Burke
Stevens, Rick L.
Tallon, Luke
Tettelin, Herve
Wentworth, David
White, Owen
Will, Rebecca
Wortman, Jennifer
Zhang, Yun
Scheuermann, Richard H.
Standardized Metadata for Human Pathogen/Vector Genomic Sequences
title Standardized Metadata for Human Pathogen/Vector Genomic Sequences
title_full Standardized Metadata for Human Pathogen/Vector Genomic Sequences
title_fullStr Standardized Metadata for Human Pathogen/Vector Genomic Sequences
title_full_unstemmed Standardized Metadata for Human Pathogen/Vector Genomic Sequences
title_short Standardized Metadata for Human Pathogen/Vector Genomic Sequences
title_sort standardized metadata for human pathogen/vector genomic sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061050/
https://www.ncbi.nlm.nih.gov/pubmed/24936976
http://dx.doi.org/10.1371/journal.pone.0099979
work_keys_str_mv AT duganvivieng standardizedmetadataforhumanpathogenvectorgenomicsequences
AT emrichscottj standardizedmetadataforhumanpathogenvectorgenomicsequences
AT giraldocalderongloriai standardizedmetadataforhumanpathogenvectorgenomicsequences
AT harbomars standardizedmetadataforhumanpathogenvectorgenomicsequences
AT newmanruchim standardizedmetadataforhumanpathogenvectorgenomicsequences
AT pickettbrette standardizedmetadataforhumanpathogenvectorgenomicsequences
AT schrimllynnm standardizedmetadataforhumanpathogenvectorgenomicsequences
AT stockwelltimothyb standardizedmetadataforhumanpathogenvectorgenomicsequences
AT stoeckertchristianj standardizedmetadataforhumanpathogenvectorgenomicsequences
AT sullivandane standardizedmetadataforhumanpathogenvectorgenomicsequences
AT singhindresh standardizedmetadataforhumanpathogenvectorgenomicsequences
AT warddoylev standardizedmetadataforhumanpathogenvectorgenomicsequences
AT yaoalison standardizedmetadataforhumanpathogenvectorgenomicsequences
AT zhengjie standardizedmetadataforhumanpathogenvectorgenomicsequences
AT barretttanya standardizedmetadataforhumanpathogenvectorgenomicsequences
AT birrenbruce standardizedmetadataforhumanpathogenvectorgenomicsequences
AT brinkaclauren standardizedmetadataforhumanpathogenvectorgenomicsequences
AT brunovincentm standardizedmetadataforhumanpathogenvectorgenomicsequences
AT calerelizabet standardizedmetadataforhumanpathogenvectorgenomicsequences
AT chapmansinead standardizedmetadataforhumanpathogenvectorgenomicsequences
AT collinsfrankh standardizedmetadataforhumanpathogenvectorgenomicsequences
AT cuomochristinaa standardizedmetadataforhumanpathogenvectorgenomicsequences
AT difrancescovalentina standardizedmetadataforhumanpathogenvectorgenomicsequences
AT durkinscott standardizedmetadataforhumanpathogenvectorgenomicsequences
AT eppingermark standardizedmetadataforhumanpathogenvectorgenomicsequences
AT feldgardenmichael standardizedmetadataforhumanpathogenvectorgenomicsequences
AT fraserclaire standardizedmetadataforhumanpathogenvectorgenomicsequences
AT frickewflorian standardizedmetadataforhumanpathogenvectorgenomicsequences
AT giovannimaria standardizedmetadataforhumanpathogenvectorgenomicsequences
AT hennmatthewr standardizedmetadataforhumanpathogenvectorgenomicsequences
AT hineerin standardizedmetadataforhumanpathogenvectorgenomicsequences
AT hotoppjuliedunning standardizedmetadataforhumanpathogenvectorgenomicsequences
AT karschmizrachiilene standardizedmetadataforhumanpathogenvectorgenomicsequences
AT kissingerjessicac standardizedmetadataforhumanpathogenvectorgenomicsequences
AT leeeunmi standardizedmetadataforhumanpathogenvectorgenomicsequences
AT mathurpunam standardizedmetadataforhumanpathogenvectorgenomicsequences
AT mongodinemmanuelf standardizedmetadataforhumanpathogenvectorgenomicsequences
AT murphycheryli standardizedmetadataforhumanpathogenvectorgenomicsequences
AT myersgarry standardizedmetadataforhumanpathogenvectorgenomicsequences
AT neafseydaniele standardizedmetadataforhumanpathogenvectorgenomicsequences
AT nelsonkarene standardizedmetadataforhumanpathogenvectorgenomicsequences
AT niermanwilliamc standardizedmetadataforhumanpathogenvectorgenomicsequences
AT puzakjulia standardizedmetadataforhumanpathogenvectorgenomicsequences
AT raskodavid standardizedmetadataforhumanpathogenvectorgenomicsequences
AT roosdavids standardizedmetadataforhumanpathogenvectorgenomicsequences
AT sadzewiczlisa standardizedmetadataforhumanpathogenvectorgenomicsequences
AT silvajoanac standardizedmetadataforhumanpathogenvectorgenomicsequences
AT sobralbruno standardizedmetadataforhumanpathogenvectorgenomicsequences
AT squiresrburke standardizedmetadataforhumanpathogenvectorgenomicsequences
AT stevensrickl standardizedmetadataforhumanpathogenvectorgenomicsequences
AT tallonluke standardizedmetadataforhumanpathogenvectorgenomicsequences
AT tettelinherve standardizedmetadataforhumanpathogenvectorgenomicsequences
AT wentworthdavid standardizedmetadataforhumanpathogenvectorgenomicsequences
AT whiteowen standardizedmetadataforhumanpathogenvectorgenomicsequences
AT willrebecca standardizedmetadataforhumanpathogenvectorgenomicsequences
AT wortmanjennifer standardizedmetadataforhumanpathogenvectorgenomicsequences
AT zhangyun standardizedmetadataforhumanpathogenvectorgenomicsequences
AT scheuermannrichardh standardizedmetadataforhumanpathogenvectorgenomicsequences