Cargando…

Universal Features for the Classification of Coding and Non-coding DNA Sequences

In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribut...

Descripción completa

Detalles Bibliográficos
Autores principales: Carels, Nicolas, Vidal, Ramon, Frías, Diego
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808180/
https://www.ncbi.nlm.nih.gov/pubmed/20140069
_version_ 1782176459349032960
author Carels, Nicolas
Vidal, Ramon
Frías, Diego
author_facet Carels, Nicolas
Vidal, Ramon
Frías, Diego
author_sort Carels, Nicolas
collection PubMed
description In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.
format Text
id pubmed-2808180
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-28081802010-02-04 Universal Features for the Classification of Coding and Non-coding DNA Sequences Carels, Nicolas Vidal, Ramon Frías, Diego Bioinform Biol Insights Original Research In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding. Libertas Academica 2009-06-03 /pmc/articles/PMC2808180/ /pubmed/20140069 Text en Copyright © 2009 The authors. http://creativecommons.org/licenses/by/2.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/2.0/).
spellingShingle Original Research
Carels, Nicolas
Vidal, Ramon
Frías, Diego
Universal Features for the Classification of Coding and Non-coding DNA Sequences
title Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_full Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_fullStr Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_full_unstemmed Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_short Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_sort universal features for the classification of coding and non-coding dna sequences
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808180/
https://www.ncbi.nlm.nih.gov/pubmed/20140069
work_keys_str_mv AT carelsnicolas universalfeaturesfortheclassificationofcodingandnoncodingdnasequences
AT vidalramon universalfeaturesfortheclassificationofcodingandnoncodingdnasequences
AT friasdiego universalfeaturesfortheclassificationofcodingandnoncodingdnasequences