Cargando…

The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness

Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for qua...

Descripción completa

Detalles Bibliográficos
Autores principales: Liolios, Konstantinos, Schriml, Lynn, Hirschman, Lynette, Pagani, Ioanna, Nosrat, Bahador, Sterk, Peter, White, Owen, Rocca-Serra, Philippe, Sansone, Susanna-Assunta, Taylor, Chris, Kyrpides, Nikos C., Field, Dawn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Michigan State University 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558968/
https://www.ncbi.nlm.nih.gov/pubmed/23409217
http://dx.doi.org/10.4056/sigs.2675953
_version_ 1782257492249542656
author Liolios, Konstantinos
Schriml, Lynn
Hirschman, Lynette
Pagani, Ioanna
Nosrat, Bahador
Sterk, Peter
White, Owen
Rocca-Serra, Philippe
Sansone, Susanna-Assunta
Taylor, Chris
Kyrpides, Nikos C.
Field, Dawn
author_facet Liolios, Konstantinos
Schriml, Lynn
Hirschman, Lynette
Pagani, Ioanna
Nosrat, Bahador
Sterk, Peter
White, Owen
Rocca-Serra, Philippe
Sansone, Susanna-Assunta
Taylor, Chris
Kyrpides, Nikos C.
Field, Dawn
author_sort Liolios, Konstantinos
collection PubMed
description Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
format Online
Article
Text
id pubmed-3558968
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Michigan State University
record_format MEDLINE/PubMed
spelling pubmed-35589682013-02-13 The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness Liolios, Konstantinos Schriml, Lynn Hirschman, Lynette Pagani, Ioanna Nosrat, Bahador Sterk, Peter White, Owen Rocca-Serra, Philippe Sansone, Susanna-Assunta Taylor, Chris Kyrpides, Nikos C. Field, Dawn Stand Genomic Sci Short Genome Reports Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework. Michigan State University 2012-07-20 /pmc/articles/PMC3558968/ /pubmed/23409217 http://dx.doi.org/10.4056/sigs.2675953 Text en Copyright © retained by original authors. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Short Genome Reports
Liolios, Konstantinos
Schriml, Lynn
Hirschman, Lynette
Pagani, Ioanna
Nosrat, Bahador
Sterk, Peter
White, Owen
Rocca-Serra, Philippe
Sansone, Susanna-Assunta
Taylor, Chris
Kyrpides, Nikos C.
Field, Dawn
The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
title The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
title_full The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
title_fullStr The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
title_full_unstemmed The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
title_short The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
title_sort metadata coverage index (mci): a standardized metric for quantifying database metadata richness
topic Short Genome Reports
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558968/
https://www.ncbi.nlm.nih.gov/pubmed/23409217
http://dx.doi.org/10.4056/sigs.2675953
work_keys_str_mv AT liolioskonstantinos themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT schrimllynn themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT hirschmanlynette themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT paganiioanna themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT nosratbahador themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT sterkpeter themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT whiteowen themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT roccaserraphilippe themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT sansonesusannaassunta themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT taylorchris themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT kyrpidesnikosc themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT fielddawn themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT liolioskonstantinos metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT schrimllynn metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT hirschmanlynette metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT paganiioanna metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT nosratbahador metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT sterkpeter metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT whiteowen metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT roccaserraphilippe metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT sansonesusannaassunta metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT taylorchris metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT kyrpidesnikosc metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness
AT fielddawn metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness