Cargando…

Identity and compatibility of reference genome resources

Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from iden...

Descripción completa

Detalles Bibliográficos
Autores principales: Stolarczyk, Michał, Xue, Bingjie, Sheffield, Nathan C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121092/
https://www.ncbi.nlm.nih.gov/pubmed/34017945
http://dx.doi.org/10.1093/nargab/lqab036
_version_ 1783692255434899456
author Stolarczyk, Michał
Xue, Bingjie
Sheffield, Nathan C
author_facet Stolarczyk, Michał
Xue, Bingjie
Sheffield, Nathan C
author_sort Stolarczyk, Michał
collection PubMed
description Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: first, we derive unique identifiers for each resource; second, we record parent–child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data. Available athttps://refgenie.databio.org.
format Online
Article
Text
id pubmed-8121092
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81210922021-05-19 Identity and compatibility of reference genome resources Stolarczyk, Michał Xue, Bingjie Sheffield, Nathan C NAR Genom Bioinform APP Notes Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: first, we derive unique identifiers for each resource; second, we record parent–child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data. Available athttps://refgenie.databio.org. Oxford University Press 2021-05-14 /pmc/articles/PMC8121092/ /pubmed/34017945 http://dx.doi.org/10.1093/nargab/lqab036 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle APP Notes
Stolarczyk, Michał
Xue, Bingjie
Sheffield, Nathan C
Identity and compatibility of reference genome resources
title Identity and compatibility of reference genome resources
title_full Identity and compatibility of reference genome resources
title_fullStr Identity and compatibility of reference genome resources
title_full_unstemmed Identity and compatibility of reference genome resources
title_short Identity and compatibility of reference genome resources
title_sort identity and compatibility of reference genome resources
topic APP Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121092/
https://www.ncbi.nlm.nih.gov/pubmed/34017945
http://dx.doi.org/10.1093/nargab/lqab036
work_keys_str_mv AT stolarczykmichał identityandcompatibilityofreferencegenomeresources
AT xuebingjie identityandcompatibilityofreferencegenomeresources
AT sheffieldnathanc identityandcompatibilityofreferencegenomeresources