Cargando…
Universal Features for the Classification of Coding and Non-coding DNA Sequences
In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribut...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808180/ https://www.ncbi.nlm.nih.gov/pubmed/20140069 |
_version_ | 1782176459349032960 |
---|---|
author | Carels, Nicolas Vidal, Ramon Frías, Diego |
author_facet | Carels, Nicolas Vidal, Ramon Frías, Diego |
author_sort | Carels, Nicolas |
collection | PubMed |
description | In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding. |
format | Text |
id | pubmed-2808180 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-28081802010-02-04 Universal Features for the Classification of Coding and Non-coding DNA Sequences Carels, Nicolas Vidal, Ramon Frías, Diego Bioinform Biol Insights Original Research In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding. Libertas Academica 2009-06-03 /pmc/articles/PMC2808180/ /pubmed/20140069 Text en Copyright © 2009 The authors. http://creativecommons.org/licenses/by/2.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/2.0/). |
spellingShingle | Original Research Carels, Nicolas Vidal, Ramon Frías, Diego Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title | Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_full | Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_fullStr | Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_full_unstemmed | Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_short | Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_sort | universal features for the classification of coding and non-coding dna sequences |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808180/ https://www.ncbi.nlm.nih.gov/pubmed/20140069 |
work_keys_str_mv | AT carelsnicolas universalfeaturesfortheclassificationofcodingandnoncodingdnasequences AT vidalramon universalfeaturesfortheclassificationofcodingandnoncodingdnasequences AT friasdiego universalfeaturesfortheclassificationofcodingandnoncodingdnasequences |