Cargando…
Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE
BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a n...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5084464/ https://www.ncbi.nlm.nih.gov/pubmed/27793185 http://dx.doi.org/10.1186/s40246-016-0090-2 |
_version_ | 1782463386919895040 |
---|---|
author | Jalali, Saakshi Gandhi, Shrey Scaria, Vinod |
author_facet | Jalali, Saakshi Gandhi, Shrey Scaria, Vinod |
author_sort | Jalali, Saakshi |
collection | PubMed |
description | BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations. RESULTS: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout. CONCLUSIONS: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40246-016-0090-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5084464 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50844642016-10-31 Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE Jalali, Saakshi Gandhi, Shrey Scaria, Vinod Hum Genomics Primary Research BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations. RESULTS: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout. CONCLUSIONS: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40246-016-0090-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-28 /pmc/articles/PMC5084464/ /pubmed/27793185 http://dx.doi.org/10.1186/s40246-016-0090-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Primary Research Jalali, Saakshi Gandhi, Shrey Scaria, Vinod Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE |
title | Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE |
title_full | Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE |
title_fullStr | Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE |
title_full_unstemmed | Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE |
title_short | Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE |
title_sort | navigating the dynamic landscape of long noncoding rna and protein-coding gene annotations in gencode |
topic | Primary Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5084464/ https://www.ncbi.nlm.nih.gov/pubmed/27793185 http://dx.doi.org/10.1186/s40246-016-0090-2 |
work_keys_str_mv | AT jalalisaakshi navigatingthedynamiclandscapeoflongnoncodingrnaandproteincodinggeneannotationsingencode AT gandhishrey navigatingthedynamiclandscapeoflongnoncodingrnaandproteincodinggeneannotationsingencode AT scariavinod navigatingthedynamiclandscapeoflongnoncodingrnaandproteincodinggeneannotationsingencode |