Cargando…

Recovering motifs from biased genomes: application of signal correction

A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information...

Descripción completa

Detalles Bibliográficos
Autores principales: Hasan, Samiul, Schreiber, Mark
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636444/
https://www.ncbi.nlm.nih.gov/pubmed/16990246
http://dx.doi.org/10.1093/nar/gkl676
_version_ 1782130750657658880
author Hasan, Samiul
Schreiber, Mark
author_facet Hasan, Samiul
Schreiber, Mark
author_sort Hasan, Samiul
collection PubMed
description A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters.
format Text
id pubmed-1636444
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-16364442006-11-29 Recovering motifs from biased genomes: application of signal correction Hasan, Samiul Schreiber, Mark Nucleic Acids Res Computational Biology A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters. Oxford University Press 2006-10 2006-09-20 /pmc/articles/PMC1636444/ /pubmed/16990246 http://dx.doi.org/10.1093/nar/gkl676 Text en © 2006 The Author(s)
spellingShingle Computational Biology
Hasan, Samiul
Schreiber, Mark
Recovering motifs from biased genomes: application of signal correction
title Recovering motifs from biased genomes: application of signal correction
title_full Recovering motifs from biased genomes: application of signal correction
title_fullStr Recovering motifs from biased genomes: application of signal correction
title_full_unstemmed Recovering motifs from biased genomes: application of signal correction
title_short Recovering motifs from biased genomes: application of signal correction
title_sort recovering motifs from biased genomes: application of signal correction
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636444/
https://www.ncbi.nlm.nih.gov/pubmed/16990246
http://dx.doi.org/10.1093/nar/gkl676
work_keys_str_mv AT hasansamiul recoveringmotifsfrombiasedgenomesapplicationofsignalcorrection
AT schreibermark recoveringmotifsfrombiasedgenomesapplicationofsignalcorrection