Cargando…

Comparison of computational methods for identifying translation initiation sites in EST data

BACKGROUND: Expressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable...

Descripción completa

Detalles Bibliográficos
Autores principales: Nadershahi, Afshin, Fahrenkrug, Scott C, Ellis, Lynda BM
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC375524/
https://www.ncbi.nlm.nih.gov/pubmed/15053846
http://dx.doi.org/10.1186/1471-2105-5-14
_version_ 1782121283525279744
author Nadershahi, Afshin
Fahrenkrug, Scott C
Ellis, Lynda BM
author_facet Nadershahi, Afshin
Fahrenkrug, Scott C
Ellis, Lynda BM
author_sort Nadershahi, Afshin
collection PubMed
description BACKGROUND: Expressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable for functional genomics studies. We have compared five methods to predict translation initiation sites in EST data: first-ATG, ESTScan, Diogenes, Netstart, and ATGpr. RESULTS: A dataset of 100 EST sequences, 50 with and 50 without, translation initiation sites, was created. Based on analysis of this dataset, ATGpr is found to be the most accurate for predicting the presence versus absence of translation initiation sites. With a maximum accuracy of 76%, ATGpr more accurately predicts the position or absence of translation initiation sites than NetStart (57%) or Diogenes (50%). ATGpr similarly excels when start sites are known to be present (90%), whereas NetStart achieves only 60% overall accuracy. As a baseline for comparison, choosing the first ATG correctly identifies the translation initiation site in 74% of the sequences. ESTScan and Diogenes, consistent with their intended use, are able to identify open reading frames, but are unable to determine the precise position of translation initiation sites. CONCLUSIONS: ATGpr demonstrates high sensitivity, specificity, and overall accuracy in identifying start sites while also rejecting incomplete sequences. A database of EST sequences suitable for validating programs for translation initiation site prediction is now available. These tools and materials may open an avenue for future improvements in start site prediction and EST analysis.
format Text
id pubmed-375524
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-3755242004-03-27 Comparison of computational methods for identifying translation initiation sites in EST data Nadershahi, Afshin Fahrenkrug, Scott C Ellis, Lynda BM BMC Bioinformatics Research Article BACKGROUND: Expressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable for functional genomics studies. We have compared five methods to predict translation initiation sites in EST data: first-ATG, ESTScan, Diogenes, Netstart, and ATGpr. RESULTS: A dataset of 100 EST sequences, 50 with and 50 without, translation initiation sites, was created. Based on analysis of this dataset, ATGpr is found to be the most accurate for predicting the presence versus absence of translation initiation sites. With a maximum accuracy of 76%, ATGpr more accurately predicts the position or absence of translation initiation sites than NetStart (57%) or Diogenes (50%). ATGpr similarly excels when start sites are known to be present (90%), whereas NetStart achieves only 60% overall accuracy. As a baseline for comparison, choosing the first ATG correctly identifies the translation initiation site in 74% of the sequences. ESTScan and Diogenes, consistent with their intended use, are able to identify open reading frames, but are unable to determine the precise position of translation initiation sites. CONCLUSIONS: ATGpr demonstrates high sensitivity, specificity, and overall accuracy in identifying start sites while also rejecting incomplete sequences. A database of EST sequences suitable for validating programs for translation initiation site prediction is now available. These tools and materials may open an avenue for future improvements in start site prediction and EST analysis. BioMed Central 2004-02-16 /pmc/articles/PMC375524/ /pubmed/15053846 http://dx.doi.org/10.1186/1471-2105-5-14 Text en Copyright © 2004 Nadershahi et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Nadershahi, Afshin
Fahrenkrug, Scott C
Ellis, Lynda BM
Comparison of computational methods for identifying translation initiation sites in EST data
title Comparison of computational methods for identifying translation initiation sites in EST data
title_full Comparison of computational methods for identifying translation initiation sites in EST data
title_fullStr Comparison of computational methods for identifying translation initiation sites in EST data
title_full_unstemmed Comparison of computational methods for identifying translation initiation sites in EST data
title_short Comparison of computational methods for identifying translation initiation sites in EST data
title_sort comparison of computational methods for identifying translation initiation sites in est data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC375524/
https://www.ncbi.nlm.nih.gov/pubmed/15053846
http://dx.doi.org/10.1186/1471-2105-5-14
work_keys_str_mv AT nadershahiafshin comparisonofcomputationalmethodsforidentifyingtranslationinitiationsitesinestdata
AT fahrenkrugscottc comparisonofcomputationalmethodsforidentifyingtranslationinitiationsitesinestdata
AT ellislyndabm comparisonofcomputationalmethodsforidentifyingtranslationinitiationsitesinestdata