Cargando…

Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR

In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to...

Descripción completa

Detalles Bibliográficos
Autores principales: Beier, Sebastian, Fiebig, Anne, Pommier, Cyril, Liyanage, Isuru, Lange, Matthias, Kersey, Paul J., Weise, Stephan, Finkers, Richard, Koylass, Baron, Cezard, Timothee, Courtot, Mélanie, Contreras-Moreira, Bruno, Naamati, Guy, Dyer, Sarah, Scholz, Uwe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9218589/
https://www.ncbi.nlm.nih.gov/pubmed/35811804
http://dx.doi.org/10.12688/f1000research.109080.2
_version_ 1784731925313748992
author Beier, Sebastian
Fiebig, Anne
Pommier, Cyril
Liyanage, Isuru
Lange, Matthias
Kersey, Paul J.
Weise, Stephan
Finkers, Richard
Koylass, Baron
Cezard, Timothee
Courtot, Mélanie
Contreras-Moreira, Bruno
Naamati, Guy
Dyer, Sarah
Scholz, Uwe
author_facet Beier, Sebastian
Fiebig, Anne
Pommier, Cyril
Liyanage, Isuru
Lange, Matthias
Kersey, Paul J.
Weise, Stephan
Finkers, Richard
Koylass, Baron
Cezard, Timothee
Courtot, Mélanie
Contreras-Moreira, Bruno
Naamati, Guy
Dyer, Sarah
Scholz, Uwe
author_sort Beier, Sebastian
collection PubMed
description In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
format Online
Article
Text
id pubmed-9218589
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-92185892022-07-08 Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR Beier, Sebastian Fiebig, Anne Pommier, Cyril Liyanage, Isuru Lange, Matthias Kersey, Paul J. Weise, Stephan Finkers, Richard Koylass, Baron Cezard, Timothee Courtot, Mélanie Contreras-Moreira, Bruno Naamati, Guy Dyer, Sarah Scholz, Uwe F1000Res Opinion Article In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains. F1000 Research Limited 2022-05-19 /pmc/articles/PMC9218589/ /pubmed/35811804 http://dx.doi.org/10.12688/f1000research.109080.2 Text en Copyright: © 2022 Beier S et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Opinion Article
Beier, Sebastian
Fiebig, Anne
Pommier, Cyril
Liyanage, Isuru
Lange, Matthias
Kersey, Paul J.
Weise, Stephan
Finkers, Richard
Koylass, Baron
Cezard, Timothee
Courtot, Mélanie
Contreras-Moreira, Bruno
Naamati, Guy
Dyer, Sarah
Scholz, Uwe
Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
title Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
title_full Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
title_fullStr Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
title_full_unstemmed Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
title_short Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
title_sort recommendations for the formatting of variant call format (vcf) files to make plant genotyping data fair
topic Opinion Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9218589/
https://www.ncbi.nlm.nih.gov/pubmed/35811804
http://dx.doi.org/10.12688/f1000research.109080.2
work_keys_str_mv AT beiersebastian recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT fiebiganne recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT pommiercyril recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT liyanageisuru recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT langematthias recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT kerseypaulj recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT weisestephan recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT finkersrichard recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT koylassbaron recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT cezardtimothee recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT courtotmelanie recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT contrerasmoreirabruno recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT naamatiguy recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT dyersarah recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair
AT scholzuwe recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair