Cargando…
Identification of protein coding regions in RNA transcripts
Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499116/ https://www.ncbi.nlm.nih.gov/pubmed/25870408 http://dx.doi.org/10.1093/nar/gkv227 |
_version_ | 1782380727531208704 |
---|---|
author | Tang, Shiyuyun Lomsadze, Alexandre Borodovsky, Mark |
author_facet | Tang, Shiyuyun Lomsadze, Alexandre Borodovsky, Mark |
author_sort | Tang, Shiyuyun |
collection | PubMed |
description | Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods. |
format | Online Article Text |
id | pubmed-4499116 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-44991162015-09-28 Identification of protein coding regions in RNA transcripts Tang, Shiyuyun Lomsadze, Alexandre Borodovsky, Mark Nucleic Acids Res Methods Online Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods. Oxford University Press 2015-07-13 2015-04-13 /pmc/articles/PMC4499116/ /pubmed/25870408 http://dx.doi.org/10.1093/nar/gkv227 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Tang, Shiyuyun Lomsadze, Alexandre Borodovsky, Mark Identification of protein coding regions in RNA transcripts |
title | Identification of protein coding regions in RNA transcripts |
title_full | Identification of protein coding regions in RNA transcripts |
title_fullStr | Identification of protein coding regions in RNA transcripts |
title_full_unstemmed | Identification of protein coding regions in RNA transcripts |
title_short | Identification of protein coding regions in RNA transcripts |
title_sort | identification of protein coding regions in rna transcripts |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499116/ https://www.ncbi.nlm.nih.gov/pubmed/25870408 http://dx.doi.org/10.1093/nar/gkv227 |
work_keys_str_mv | AT tangshiyuyun identificationofproteincodingregionsinrnatranscripts AT lomsadzealexandre identificationofproteincodingregionsinrnatranscripts AT borodovskymark identificationofproteincodingregionsinrnatranscripts |