Cargando…

Identification and utilization of arbitrary correlations in models of recombination signal sequences

BACKGROUND: A significant challenge in bioinformatics is to develop methods for detecting and modeling patterns in variable DNA sequence sites, such as protein-binding sites in regulatory DNA. Current approaches sometimes perform poorly when positions in the site do not independently affect protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Cowell, Lindsay G, Davila, Marco, Kepler, Thomas B, Kelsoe, Garnett
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151174/
https://www.ncbi.nlm.nih.gov/pubmed/12537561
_version_ 1782120661075886080
author Cowell, Lindsay G
Davila, Marco
Kepler, Thomas B
Kelsoe, Garnett
author_facet Cowell, Lindsay G
Davila, Marco
Kepler, Thomas B
Kelsoe, Garnett
author_sort Cowell, Lindsay G
collection PubMed
description BACKGROUND: A significant challenge in bioinformatics is to develop methods for detecting and modeling patterns in variable DNA sequence sites, such as protein-binding sites in regulatory DNA. Current approaches sometimes perform poorly when positions in the site do not independently affect protein binding. We developed a statistical technique for modeling the correlation structure in variable DNA sequence sites. The method places no restrictions on the number of correlated positions or on their spatial relationship within the site. No prior empirical evidence for the correlation structure is necessary. RESULTS: We applied our method to the recombination signal sequences (RSS) that direct assembly of B-cell and T-cell antigen-receptor genes via V(D)J recombination. The technique is based on model selection by cross-validation and produces models that allow computation of an information score for any signal-length sequence. We also modeled RSS using order zero and order one Markov chains. The scores from all models are highly correlated with measured recombination efficiencies, but the models arising from our technique are better than the Markov models at discriminating RSS from non-RSS. CONCLUSIONS: Our model-development procedure produces models that estimate well the recombinogenic potential of RSS and are better at RSS recognition than the order zero and order one Markov models. Our models are, therefore, valuable for studying the regulation of both physiologic and aberrant V(D)J recombination. The approach could be equally powerful for the study of promoter and enhancer elements, splice sites, and other DNA regulatory sites that are highly variable at the level of individual nucleotide positions.
format Text
id pubmed-151174
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1511742003-03-13 Identification and utilization of arbitrary correlations in models of recombination signal sequences Cowell, Lindsay G Davila, Marco Kepler, Thomas B Kelsoe, Garnett Genome Biol Research BACKGROUND: A significant challenge in bioinformatics is to develop methods for detecting and modeling patterns in variable DNA sequence sites, such as protein-binding sites in regulatory DNA. Current approaches sometimes perform poorly when positions in the site do not independently affect protein binding. We developed a statistical technique for modeling the correlation structure in variable DNA sequence sites. The method places no restrictions on the number of correlated positions or on their spatial relationship within the site. No prior empirical evidence for the correlation structure is necessary. RESULTS: We applied our method to the recombination signal sequences (RSS) that direct assembly of B-cell and T-cell antigen-receptor genes via V(D)J recombination. The technique is based on model selection by cross-validation and produces models that allow computation of an information score for any signal-length sequence. We also modeled RSS using order zero and order one Markov chains. The scores from all models are highly correlated with measured recombination efficiencies, but the models arising from our technique are better than the Markov models at discriminating RSS from non-RSS. CONCLUSIONS: Our model-development procedure produces models that estimate well the recombinogenic potential of RSS and are better at RSS recognition than the order zero and order one Markov models. Our models are, therefore, valuable for studying the regulation of both physiologic and aberrant V(D)J recombination. The approach could be equally powerful for the study of promoter and enhancer elements, splice sites, and other DNA regulatory sites that are highly variable at the level of individual nucleotide positions. BioMed Central 2002 2002-11-21 /pmc/articles/PMC151174/ /pubmed/12537561 Text en Copyright © 2002 Cowell et al., licensee BioMed Central Ltd
spellingShingle Research
Cowell, Lindsay G
Davila, Marco
Kepler, Thomas B
Kelsoe, Garnett
Identification and utilization of arbitrary correlations in models of recombination signal sequences
title Identification and utilization of arbitrary correlations in models of recombination signal sequences
title_full Identification and utilization of arbitrary correlations in models of recombination signal sequences
title_fullStr Identification and utilization of arbitrary correlations in models of recombination signal sequences
title_full_unstemmed Identification and utilization of arbitrary correlations in models of recombination signal sequences
title_short Identification and utilization of arbitrary correlations in models of recombination signal sequences
title_sort identification and utilization of arbitrary correlations in models of recombination signal sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151174/
https://www.ncbi.nlm.nih.gov/pubmed/12537561
work_keys_str_mv AT cowelllindsayg identificationandutilizationofarbitrarycorrelationsinmodelsofrecombinationsignalsequences
AT davilamarco identificationandutilizationofarbitrarycorrelationsinmodelsofrecombinationsignalsequences
AT keplerthomasb identificationandutilizationofarbitrarycorrelationsinmodelsofrecombinationsignalsequences
AT kelsoegarnett identificationandutilizationofarbitrarycorrelationsinmodelsofrecombinationsignalsequences