Cargando…

A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure a...

Descripción completa

Detalles Bibliográficos
Autores principales: Glusman, Gustavo, Qin, Shizhen, El-Gewely, M. Raafat, Siegel, Andrew F, Roach, Jared C, Hood, Leroy, Smit, Arian F. A
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1391917/
https://www.ncbi.nlm.nih.gov/pubmed/16543943
http://dx.doi.org/10.1371/journal.pcbi.0020018
_version_ 1782126915710091264
author Glusman, Gustavo
Qin, Shizhen
El-Gewely, M. Raafat
Siegel, Andrew F
Roach, Jared C
Hood, Leroy
Smit, Arian F. A
author_facet Glusman, Gustavo
Qin, Shizhen
El-Gewely, M. Raafat
Siegel, Andrew F
Roach, Jared C
Hood, Leroy
Smit, Arian F. A
author_sort Glusman, Gustavo
collection PubMed
description The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.”
format Text
id pubmed-1391917
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-13919172006-04-06 A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions Glusman, Gustavo Qin, Shizhen El-Gewely, M. Raafat Siegel, Andrew F Roach, Jared C Hood, Leroy Smit, Arian F. A PLoS Comput Biol Research Article The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” Public Library of Science 2006-03 2006-03-17 /pmc/articles/PMC1391917/ /pubmed/16543943 http://dx.doi.org/10.1371/journal.pcbi.0020018 Text en © 2006 Glusman et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Glusman, Gustavo
Qin, Shizhen
El-Gewely, M. Raafat
Siegel, Andrew F
Roach, Jared C
Hood, Leroy
Smit, Arian F. A
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
title A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
title_full A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
title_fullStr A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
title_full_unstemmed A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
title_short A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
title_sort third approach to gene prediction suggests thousands of additional human transcribed regions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1391917/
https://www.ncbi.nlm.nih.gov/pubmed/16543943
http://dx.doi.org/10.1371/journal.pcbi.0020018
work_keys_str_mv AT glusmangustavo athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT qinshizhen athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT elgewelymraafat athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT siegelandrewf athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT roachjaredc athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT hoodleroy athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT smitarianfa athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT glusmangustavo thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT qinshizhen thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT elgewelymraafat thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT siegelandrewf thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT roachjaredc thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT hoodleroy thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions
AT smitarianfa thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions