Cargando…
The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness
Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for qua...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Michigan State University
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558968/ https://www.ncbi.nlm.nih.gov/pubmed/23409217 http://dx.doi.org/10.4056/sigs.2675953 |
_version_ | 1782257492249542656 |
---|---|
author | Liolios, Konstantinos Schriml, Lynn Hirschman, Lynette Pagani, Ioanna Nosrat, Bahador Sterk, Peter White, Owen Rocca-Serra, Philippe Sansone, Susanna-Assunta Taylor, Chris Kyrpides, Nikos C. Field, Dawn |
author_facet | Liolios, Konstantinos Schriml, Lynn Hirschman, Lynette Pagani, Ioanna Nosrat, Bahador Sterk, Peter White, Owen Rocca-Serra, Philippe Sansone, Susanna-Assunta Taylor, Chris Kyrpides, Nikos C. Field, Dawn |
author_sort | Liolios, Konstantinos |
collection | PubMed |
description | Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework. |
format | Online Article Text |
id | pubmed-3558968 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Michigan State University |
record_format | MEDLINE/PubMed |
spelling | pubmed-35589682013-02-13 The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness Liolios, Konstantinos Schriml, Lynn Hirschman, Lynette Pagani, Ioanna Nosrat, Bahador Sterk, Peter White, Owen Rocca-Serra, Philippe Sansone, Susanna-Assunta Taylor, Chris Kyrpides, Nikos C. Field, Dawn Stand Genomic Sci Short Genome Reports Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework. Michigan State University 2012-07-20 /pmc/articles/PMC3558968/ /pubmed/23409217 http://dx.doi.org/10.4056/sigs.2675953 Text en Copyright © retained by original authors. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Short Genome Reports Liolios, Konstantinos Schriml, Lynn Hirschman, Lynette Pagani, Ioanna Nosrat, Bahador Sterk, Peter White, Owen Rocca-Serra, Philippe Sansone, Susanna-Assunta Taylor, Chris Kyrpides, Nikos C. Field, Dawn The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness |
title | The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness |
title_full | The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness |
title_fullStr | The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness |
title_full_unstemmed | The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness |
title_short | The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness |
title_sort | metadata coverage index (mci): a standardized metric for quantifying database metadata richness |
topic | Short Genome Reports |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558968/ https://www.ncbi.nlm.nih.gov/pubmed/23409217 http://dx.doi.org/10.4056/sigs.2675953 |
work_keys_str_mv | AT liolioskonstantinos themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT schrimllynn themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT hirschmanlynette themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT paganiioanna themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT nosratbahador themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT sterkpeter themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT whiteowen themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT roccaserraphilippe themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT sansonesusannaassunta themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT taylorchris themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT kyrpidesnikosc themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT fielddawn themetadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT liolioskonstantinos metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT schrimllynn metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT hirschmanlynette metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT paganiioanna metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT nosratbahador metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT sterkpeter metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT whiteowen metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT roccaserraphilippe metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT sansonesusannaassunta metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT taylorchris metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT kyrpidesnikosc metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness AT fielddawn metadatacoverageindexmciastandardizedmetricforquantifyingdatabasemetadatarichness |