Cargando…

Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

BACKGROUND: The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the...

Descripción completa

Detalles Bibliográficos
Autores principales: Rè, Matteo, Pesole, Graziano, Horner, David S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2758873/
https://www.ncbi.nlm.nih.gov/pubmed/19737408
http://dx.doi.org/10.1186/1471-2105-10-282
_version_ 1782172621370032128
author Rè, Matteo
Pesole, Graziano
Horner, David S
author_facet Rè, Matteo
Pesole, Graziano
Horner, David S
author_sort Rè, Matteo
collection PubMed
description BACKGROUND: The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. RESULTS: Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. CONCLUSION: We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.
format Text
id pubmed-2758873
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27588732009-10-08 Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics Rè, Matteo Pesole, Graziano Horner, David S BMC Bioinformatics Methodology Article BACKGROUND: The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. RESULTS: Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. CONCLUSION: We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences. BioMed Central 2009-09-08 /pmc/articles/PMC2758873/ /pubmed/19737408 http://dx.doi.org/10.1186/1471-2105-10-282 Text en Copyright © 2009 Rè et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Rè, Matteo
Pesole, Graziano
Horner, David S
Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
title Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
title_full Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
title_fullStr Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
title_full_unstemmed Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
title_short Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
title_sort accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2758873/
https://www.ncbi.nlm.nih.gov/pubmed/19737408
http://dx.doi.org/10.1186/1471-2105-10-282
work_keys_str_mv AT rematteo accuratediscriminationofconservedcodingandnoncodingregionsthroughmultipleindicatorsofevolutionarydynamics
AT pesolegraziano accuratediscriminationofconservedcodingandnoncodingregionsthroughmultipleindicatorsofevolutionarydynamics
AT hornerdavids accuratediscriminationofconservedcodingandnoncodingregionsthroughmultipleindicatorsofevolutionarydynamics