Cargando…

Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data

We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Matthews, Beverley B., dos Santos, Gilberto, Crosby, Madeline A., Emmert, David B., St. Pierre, Susan E., Gramates, L. Sian, Zhou, Pinglei, Schroeder, Andrew J., Falls, Kathleen, Strelets, Victor, Russo, Susan M., Gelbart, William M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528329/
https://www.ncbi.nlm.nih.gov/pubmed/26109357
http://dx.doi.org/10.1534/g3.115.018929
_version_ 1782384670745296896
author Matthews, Beverley B.
dos Santos, Gilberto
Crosby, Madeline A.
Emmert, David B.
St. Pierre, Susan E.
Gramates, L. Sian
Zhou, Pinglei
Schroeder, Andrew J.
Falls, Kathleen
Strelets, Victor
Russo, Susan M.
Gelbart, William M.
author_facet Matthews, Beverley B.
dos Santos, Gilberto
Crosby, Madeline A.
Emmert, David B.
St. Pierre, Susan E.
Gramates, L. Sian
Zhou, Pinglei
Schroeder, Andrew J.
Falls, Kathleen
Strelets, Victor
Russo, Susan M.
Gelbart, William M.
author_sort Matthews, Beverley B.
collection PubMed
description We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.
format Online
Article
Text
id pubmed-4528329
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-45283292015-08-10 Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data Matthews, Beverley B. dos Santos, Gilberto Crosby, Madeline A. Emmert, David B. St. Pierre, Susan E. Gramates, L. Sian Zhou, Pinglei Schroeder, Andrew J. Falls, Kathleen Strelets, Victor Russo, Susan M. Gelbart, William M. G3 (Bethesda) Investigations We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts. Genetics Society of America 2015-06-24 /pmc/articles/PMC4528329/ /pubmed/26109357 http://dx.doi.org/10.1534/g3.115.018929 Text en Copyright © 2015 Matthews et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Matthews, Beverley B.
dos Santos, Gilberto
Crosby, Madeline A.
Emmert, David B.
St. Pierre, Susan E.
Gramates, L. Sian
Zhou, Pinglei
Schroeder, Andrew J.
Falls, Kathleen
Strelets, Victor
Russo, Susan M.
Gelbart, William M.
Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
title Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
title_full Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
title_fullStr Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
title_full_unstemmed Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
title_short Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
title_sort gene model annotations for drosophila melanogaster: impact of high-throughput data
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528329/
https://www.ncbi.nlm.nih.gov/pubmed/26109357
http://dx.doi.org/10.1534/g3.115.018929
work_keys_str_mv AT matthewsbeverleyb genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT dossantosgilberto genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT crosbymadelinea genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT emmertdavidb genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT stpierresusane genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT gramateslsian genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT zhoupinglei genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT schroederandrewj genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT fallskathleen genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT streletsvictor genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT russosusanm genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT gelbartwilliamm genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata
AT genemodelannotationsfordrosophilamelanogasterimpactofhighthroughputdata