Cargando…

Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE

BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a n...

Descripción completa

Detalles Bibliográficos
Autores principales: Jalali, Saakshi, Gandhi, Shrey, Scaria, Vinod
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5084464/
https://www.ncbi.nlm.nih.gov/pubmed/27793185
http://dx.doi.org/10.1186/s40246-016-0090-2
_version_ 1782463386919895040
author Jalali, Saakshi
Gandhi, Shrey
Scaria, Vinod
author_facet Jalali, Saakshi
Gandhi, Shrey
Scaria, Vinod
author_sort Jalali, Saakshi
collection PubMed
description BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations. RESULTS: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout. CONCLUSIONS: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40246-016-0090-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5084464
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50844642016-10-31 Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE Jalali, Saakshi Gandhi, Shrey Scaria, Vinod Hum Genomics Primary Research BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations. RESULTS: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout. CONCLUSIONS: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40246-016-0090-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-28 /pmc/articles/PMC5084464/ /pubmed/27793185 http://dx.doi.org/10.1186/s40246-016-0090-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Primary Research
Jalali, Saakshi
Gandhi, Shrey
Scaria, Vinod
Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
title Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
title_full Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
title_fullStr Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
title_full_unstemmed Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
title_short Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
title_sort navigating the dynamic landscape of long noncoding rna and protein-coding gene annotations in gencode
topic Primary Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5084464/
https://www.ncbi.nlm.nih.gov/pubmed/27793185
http://dx.doi.org/10.1186/s40246-016-0090-2
work_keys_str_mv AT jalalisaakshi navigatingthedynamiclandscapeoflongnoncodingrnaandproteincodinggeneannotationsingencode
AT gandhishrey navigatingthedynamiclandscapeoflongnoncodingrnaandproteincodinggeneannotationsingencode
AT scariavinod navigatingthedynamiclandscapeoflongnoncodingrnaandproteincodinggeneannotationsingencode