Cargando…
Identity and compatibility of reference genome resources
Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from iden...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121092/ https://www.ncbi.nlm.nih.gov/pubmed/34017945 http://dx.doi.org/10.1093/nargab/lqab036 |
_version_ | 1783692255434899456 |
---|---|
author | Stolarczyk, Michał Xue, Bingjie Sheffield, Nathan C |
author_facet | Stolarczyk, Michał Xue, Bingjie Sheffield, Nathan C |
author_sort | Stolarczyk, Michał |
collection | PubMed |
description | Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: first, we derive unique identifiers for each resource; second, we record parent–child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data. Available athttps://refgenie.databio.org. |
format | Online Article Text |
id | pubmed-8121092 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-81210922021-05-19 Identity and compatibility of reference genome resources Stolarczyk, Michał Xue, Bingjie Sheffield, Nathan C NAR Genom Bioinform APP Notes Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: first, we derive unique identifiers for each resource; second, we record parent–child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data. Available athttps://refgenie.databio.org. Oxford University Press 2021-05-14 /pmc/articles/PMC8121092/ /pubmed/34017945 http://dx.doi.org/10.1093/nargab/lqab036 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | APP Notes Stolarczyk, Michał Xue, Bingjie Sheffield, Nathan C Identity and compatibility of reference genome resources |
title | Identity and compatibility of reference genome resources |
title_full | Identity and compatibility of reference genome resources |
title_fullStr | Identity and compatibility of reference genome resources |
title_full_unstemmed | Identity and compatibility of reference genome resources |
title_short | Identity and compatibility of reference genome resources |
title_sort | identity and compatibility of reference genome resources |
topic | APP Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8121092/ https://www.ncbi.nlm.nih.gov/pubmed/34017945 http://dx.doi.org/10.1093/nargab/lqab036 |
work_keys_str_mv | AT stolarczykmichał identityandcompatibilityofreferencegenomeresources AT xuebingjie identityandcompatibilityofreferencegenomeresources AT sheffieldnathanc identityandcompatibilityofreferencegenomeresources |