Cargando…
Recovering motifs from biased genomes: application of signal correction
A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636444/ https://www.ncbi.nlm.nih.gov/pubmed/16990246 http://dx.doi.org/10.1093/nar/gkl676 |
_version_ | 1782130750657658880 |
---|---|
author | Hasan, Samiul Schreiber, Mark |
author_facet | Hasan, Samiul Schreiber, Mark |
author_sort | Hasan, Samiul |
collection | PubMed |
description | A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters. |
format | Text |
id | pubmed-1636444 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-16364442006-11-29 Recovering motifs from biased genomes: application of signal correction Hasan, Samiul Schreiber, Mark Nucleic Acids Res Computational Biology A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters. Oxford University Press 2006-10 2006-09-20 /pmc/articles/PMC1636444/ /pubmed/16990246 http://dx.doi.org/10.1093/nar/gkl676 Text en © 2006 The Author(s) |
spellingShingle | Computational Biology Hasan, Samiul Schreiber, Mark Recovering motifs from biased genomes: application of signal correction |
title | Recovering motifs from biased genomes: application of signal correction |
title_full | Recovering motifs from biased genomes: application of signal correction |
title_fullStr | Recovering motifs from biased genomes: application of signal correction |
title_full_unstemmed | Recovering motifs from biased genomes: application of signal correction |
title_short | Recovering motifs from biased genomes: application of signal correction |
title_sort | recovering motifs from biased genomes: application of signal correction |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636444/ https://www.ncbi.nlm.nih.gov/pubmed/16990246 http://dx.doi.org/10.1093/nar/gkl676 |
work_keys_str_mv | AT hasansamiul recoveringmotifsfrombiasedgenomesapplicationofsignalcorrection AT schreibermark recoveringmotifsfrombiasedgenomesapplicationofsignalcorrection |