Cargando…

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Liang, Luo, Haitao, Bu, Dechao, Zhao, Guoguang, Yu, Kuntao, Zhang, Changhai, Liu, Yuanning, Chen, Runsheng, Zhao, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783192/
https://www.ncbi.nlm.nih.gov/pubmed/23892401
http://dx.doi.org/10.1093/nar/gkt646
_version_ 1782285640558182400
author Sun, Liang
Luo, Haitao
Bu, Dechao
Zhao, Guoguang
Yu, Kuntao
Zhang, Changhai
Liu, Yuanning
Chen, Runsheng
Zhao, Yi
author_facet Sun, Liang
Luo, Haitao
Bu, Dechao
Zhao, Guoguang
Yu, Kuntao
Zhang, Changhai
Liu, Yuanning
Chen, Runsheng
Zhao, Yi
author_sort Sun, Liang
collection PubMed
description It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense–antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.
format Online
Article
Text
id pubmed-3783192
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37831922013-09-30 Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts Sun, Liang Luo, Haitao Bu, Dechao Zhao, Guoguang Yu, Kuntao Zhang, Changhai Liu, Yuanning Chen, Runsheng Zhao, Yi Nucleic Acids Res Methods Online It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense–antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci. Oxford University Press 2013-09 2013-07-27 /pmc/articles/PMC3783192/ /pubmed/23892401 http://dx.doi.org/10.1093/nar/gkt646 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Sun, Liang
Luo, Haitao
Bu, Dechao
Zhao, Guoguang
Yu, Kuntao
Zhang, Changhai
Liu, Yuanning
Chen, Runsheng
Zhao, Yi
Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
title Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
title_full Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
title_fullStr Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
title_full_unstemmed Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
title_short Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
title_sort utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783192/
https://www.ncbi.nlm.nih.gov/pubmed/23892401
http://dx.doi.org/10.1093/nar/gkt646
work_keys_str_mv AT sunliang utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT luohaitao utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT budechao utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT zhaoguoguang utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT yukuntao utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT zhangchanghai utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT liuyuanning utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT chenrunsheng utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts
AT zhaoyi utilizingsequenceintrinsiccompositiontoclassifyproteincodingandlongnoncodingtranscripts