Cargando…
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure a...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1391917/ https://www.ncbi.nlm.nih.gov/pubmed/16543943 http://dx.doi.org/10.1371/journal.pcbi.0020018 |
_version_ | 1782126915710091264 |
---|---|
author | Glusman, Gustavo Qin, Shizhen El-Gewely, M. Raafat Siegel, Andrew F Roach, Jared C Hood, Leroy Smit, Arian F. A |
author_facet | Glusman, Gustavo Qin, Shizhen El-Gewely, M. Raafat Siegel, Andrew F Roach, Jared C Hood, Leroy Smit, Arian F. A |
author_sort | Glusman, Gustavo |
collection | PubMed |
description | The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” |
format | Text |
id | pubmed-1391917 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-13919172006-04-06 A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions Glusman, Gustavo Qin, Shizhen El-Gewely, M. Raafat Siegel, Andrew F Roach, Jared C Hood, Leroy Smit, Arian F. A PLoS Comput Biol Research Article The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” Public Library of Science 2006-03 2006-03-17 /pmc/articles/PMC1391917/ /pubmed/16543943 http://dx.doi.org/10.1371/journal.pcbi.0020018 Text en © 2006 Glusman et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Glusman, Gustavo Qin, Shizhen El-Gewely, M. Raafat Siegel, Andrew F Roach, Jared C Hood, Leroy Smit, Arian F. A A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions |
title | A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions |
title_full | A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions |
title_fullStr | A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions |
title_full_unstemmed | A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions |
title_short | A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions |
title_sort | third approach to gene prediction suggests thousands of additional human transcribed regions |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1391917/ https://www.ncbi.nlm.nih.gov/pubmed/16543943 http://dx.doi.org/10.1371/journal.pcbi.0020018 |
work_keys_str_mv | AT glusmangustavo athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT qinshizhen athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT elgewelymraafat athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT siegelandrewf athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT roachjaredc athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT hoodleroy athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT smitarianfa athirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT glusmangustavo thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT qinshizhen thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT elgewelymraafat thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT siegelandrewf thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT roachjaredc thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT hoodleroy thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions AT smitarianfa thirdapproachtogenepredictionsuggeststhousandsofadditionalhumantranscribedregions |