Cargando…

UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation

When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene t...

Descripción completa

Detalles Bibliográficos
Autores principales: Jackman, Shaun D., Bohlmann, Joerg, Birol, İnanç
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4447347/
https://www.ncbi.nlm.nih.gov/pubmed/26020645
http://dx.doi.org/10.1371/journal.pone.0128026
_version_ 1782373576587870208
author Jackman, Shaun D.
Bohlmann, Joerg
Birol, İnanç
author_facet Jackman, Shaun D.
Bohlmann, Joerg
Birol, İnanç
author_sort Jackman, Shaun D.
collection PubMed
description When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper.
format Online
Article
Text
id pubmed-4447347
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44473472015-06-09 UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation Jackman, Shaun D. Bohlmann, Joerg Birol, İnanç PLoS One Research Article When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper. Public Library of Science 2015-05-28 /pmc/articles/PMC4447347/ /pubmed/26020645 http://dx.doi.org/10.1371/journal.pone.0128026 Text en © 2015 Jackman et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Jackman, Shaun D.
Bohlmann, Joerg
Birol, İnanç
UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation
title UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation
title_full UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation
title_fullStr UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation
title_full_unstemmed UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation
title_short UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation
title_sort uniqtag: content-derived unique and stable identifiers for gene annotation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4447347/
https://www.ncbi.nlm.nih.gov/pubmed/26020645
http://dx.doi.org/10.1371/journal.pone.0128026
work_keys_str_mv AT jackmanshaund uniqtagcontentderiveduniqueandstableidentifiersforgeneannotation
AT bohlmannjoerg uniqtagcontentderiveduniqueandstableidentifiersforgeneannotation
AT birolinanc uniqtagcontentderiveduniqueandstableidentifiersforgeneannotation