Cargando…

Identification of protein coding regions in RNA transcripts

Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Shiyuyun, Lomsadze, Alexandre, Borodovsky, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499116/
https://www.ncbi.nlm.nih.gov/pubmed/25870408
http://dx.doi.org/10.1093/nar/gkv227
_version_ 1782380727531208704
author Tang, Shiyuyun
Lomsadze, Alexandre
Borodovsky, Mark
author_facet Tang, Shiyuyun
Lomsadze, Alexandre
Borodovsky, Mark
author_sort Tang, Shiyuyun
collection PubMed
description Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods.
format Online
Article
Text
id pubmed-4499116
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44991162015-09-28 Identification of protein coding regions in RNA transcripts Tang, Shiyuyun Lomsadze, Alexandre Borodovsky, Mark Nucleic Acids Res Methods Online Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods. Oxford University Press 2015-07-13 2015-04-13 /pmc/articles/PMC4499116/ /pubmed/25870408 http://dx.doi.org/10.1093/nar/gkv227 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Tang, Shiyuyun
Lomsadze, Alexandre
Borodovsky, Mark
Identification of protein coding regions in RNA transcripts
title Identification of protein coding regions in RNA transcripts
title_full Identification of protein coding regions in RNA transcripts
title_fullStr Identification of protein coding regions in RNA transcripts
title_full_unstemmed Identification of protein coding regions in RNA transcripts
title_short Identification of protein coding regions in RNA transcripts
title_sort identification of protein coding regions in rna transcripts
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499116/
https://www.ncbi.nlm.nih.gov/pubmed/25870408
http://dx.doi.org/10.1093/nar/gkv227
work_keys_str_mv AT tangshiyuyun identificationofproteincodingregionsinrnatranscripts
AT lomsadzealexandre identificationofproteincodingregionsinrnatranscripts
AT borodovskymark identificationofproteincodingregionsinrnatranscripts