Cargando…
Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9218589/ https://www.ncbi.nlm.nih.gov/pubmed/35811804 http://dx.doi.org/10.12688/f1000research.109080.2 |
_version_ | 1784731925313748992 |
---|---|
author | Beier, Sebastian Fiebig, Anne Pommier, Cyril Liyanage, Isuru Lange, Matthias Kersey, Paul J. Weise, Stephan Finkers, Richard Koylass, Baron Cezard, Timothee Courtot, Mélanie Contreras-Moreira, Bruno Naamati, Guy Dyer, Sarah Scholz, Uwe |
author_facet | Beier, Sebastian Fiebig, Anne Pommier, Cyril Liyanage, Isuru Lange, Matthias Kersey, Paul J. Weise, Stephan Finkers, Richard Koylass, Baron Cezard, Timothee Courtot, Mélanie Contreras-Moreira, Bruno Naamati, Guy Dyer, Sarah Scholz, Uwe |
author_sort | Beier, Sebastian |
collection | PubMed |
description | In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains. |
format | Online Article Text |
id | pubmed-9218589 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-92185892022-07-08 Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR Beier, Sebastian Fiebig, Anne Pommier, Cyril Liyanage, Isuru Lange, Matthias Kersey, Paul J. Weise, Stephan Finkers, Richard Koylass, Baron Cezard, Timothee Courtot, Mélanie Contreras-Moreira, Bruno Naamati, Guy Dyer, Sarah Scholz, Uwe F1000Res Opinion Article In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains. F1000 Research Limited 2022-05-19 /pmc/articles/PMC9218589/ /pubmed/35811804 http://dx.doi.org/10.12688/f1000research.109080.2 Text en Copyright: © 2022 Beier S et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Opinion Article Beier, Sebastian Fiebig, Anne Pommier, Cyril Liyanage, Isuru Lange, Matthias Kersey, Paul J. Weise, Stephan Finkers, Richard Koylass, Baron Cezard, Timothee Courtot, Mélanie Contreras-Moreira, Bruno Naamati, Guy Dyer, Sarah Scholz, Uwe Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR |
title | Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR |
title_full | Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR |
title_fullStr | Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR |
title_full_unstemmed | Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR |
title_short | Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR |
title_sort | recommendations for the formatting of variant call format (vcf) files to make plant genotyping data fair |
topic | Opinion Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9218589/ https://www.ncbi.nlm.nih.gov/pubmed/35811804 http://dx.doi.org/10.12688/f1000research.109080.2 |
work_keys_str_mv | AT beiersebastian recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT fiebiganne recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT pommiercyril recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT liyanageisuru recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT langematthias recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT kerseypaulj recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT weisestephan recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT finkersrichard recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT koylassbaron recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT cezardtimothee recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT courtotmelanie recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT contrerasmoreirabruno recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT naamatiguy recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT dyersarah recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair AT scholzuwe recommendationsfortheformattingofvariantcallformatvcffilestomakeplantgenotypingdatafair |