Cargando…

Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2

BACKGROUND: The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs, homology searches and ab initio gene predictions. Previ...

Descripción completa

Detalles Bibliográficos
Autores principales: Kwan, Alan L, Li, Linya, Kulp, David C, Dutcher, Susan K, Stormo, Gary D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694837/
https://www.ncbi.nlm.nih.gov/pubmed/19422688
http://dx.doi.org/10.1186/1471-2164-10-210
_version_ 1782168134249086976
author Kwan, Alan L
Li, Linya
Kulp, David C
Dutcher, Susan K
Stormo, Gary D
author_facet Kwan, Alan L
Li, Linya
Kulp, David C
Dutcher, Susan K
Stormo, Gary D
author_sort Kwan, Alan L
collection PubMed
description BACKGROUND: The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs, homology searches and ab initio gene predictions. Previously, the Genie gene-finding algorithm was trained on a small set of Chlamydomonas genes and shown to improve the accuracy of gene prediction in this species compared to other available programs. To improve ab initio gene finding in Chlamydomonas, we assemble a new training set consisting of over 2,300 cDNAs by assembling over 167,000 Chlamydomonas EST entries in GenBank using the EST assembly tool PASA. RESULTS: The prediction accuracy of our cDNA-trained gene-finder, GreenGenie2, attains 83% sensitivity and 83% specificity for exons on short-sequence predictions. We predict about 12,000 genes in the version v3 Chlamydomonas genome assembly, most of which (78%) are either identical to or significantly overlap the published catalog of Chlamydomonas genes [1]. 22% of the published catalog is absent from the GreenGenie2 predictions; there is also a fraction (23%) of GreenGenie2 predictions that are absent from the published gene catalog. Randomly chosen gene models were tested by RT-PCR and most support the GreenGenie2 predictions. CONCLUSION: These data suggest that training with EST assemblies is highly effective and that GreenGenie2 is a valuable, complementary tool for predicting genes in Chlamydomonas reinhardtii.
format Text
id pubmed-2694837
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26948372009-06-11 Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2 Kwan, Alan L Li, Linya Kulp, David C Dutcher, Susan K Stormo, Gary D BMC Genomics Research Article BACKGROUND: The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs, homology searches and ab initio gene predictions. Previously, the Genie gene-finding algorithm was trained on a small set of Chlamydomonas genes and shown to improve the accuracy of gene prediction in this species compared to other available programs. To improve ab initio gene finding in Chlamydomonas, we assemble a new training set consisting of over 2,300 cDNAs by assembling over 167,000 Chlamydomonas EST entries in GenBank using the EST assembly tool PASA. RESULTS: The prediction accuracy of our cDNA-trained gene-finder, GreenGenie2, attains 83% sensitivity and 83% specificity for exons on short-sequence predictions. We predict about 12,000 genes in the version v3 Chlamydomonas genome assembly, most of which (78%) are either identical to or significantly overlap the published catalog of Chlamydomonas genes [1]. 22% of the published catalog is absent from the GreenGenie2 predictions; there is also a fraction (23%) of GreenGenie2 predictions that are absent from the published gene catalog. Randomly chosen gene models were tested by RT-PCR and most support the GreenGenie2 predictions. CONCLUSION: These data suggest that training with EST assemblies is highly effective and that GreenGenie2 is a valuable, complementary tool for predicting genes in Chlamydomonas reinhardtii. BioMed Central 2009-05-07 /pmc/articles/PMC2694837/ /pubmed/19422688 http://dx.doi.org/10.1186/1471-2164-10-210 Text en Copyright © 2009 Kwan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kwan, Alan L
Li, Linya
Kulp, David C
Dutcher, Susan K
Stormo, Gary D
Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2
title Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2
title_full Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2
title_fullStr Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2
title_full_unstemmed Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2
title_short Improving Gene-finding in Chlamydomonas reinhardtii:GreenGenie2
title_sort improving gene-finding in chlamydomonas reinhardtii:greengenie2
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694837/
https://www.ncbi.nlm.nih.gov/pubmed/19422688
http://dx.doi.org/10.1186/1471-2164-10-210
work_keys_str_mv AT kwanalanl improvinggenefindinginchlamydomonasreinhardtiigreengenie2
AT lilinya improvinggenefindinginchlamydomonasreinhardtiigreengenie2
AT kulpdavidc improvinggenefindinginchlamydomonasreinhardtiigreengenie2
AT dutchersusank improvinggenefindinginchlamydomonasreinhardtiigreengenie2
AT stormogaryd improvinggenefindinginchlamydomonasreinhardtiigreengenie2