Cargando…

GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA

BACKGROUND: The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and stru...

Descripción completa

Detalles Bibliográficos
Autores principales: Beiko, Robert G, Charlebois, Robert L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC553964/
https://www.ncbi.nlm.nih.gov/pubmed/15725347
http://dx.doi.org/10.1186/1471-2105-6-36
_version_ 1782122491627438080
author Beiko, Robert G
Charlebois, Robert L
author_facet Beiko, Robert G
Charlebois, Robert L
author_sort Beiko, Robert G
collection PubMed
description BACKGROUND: The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. RESULTS: GANN (available at ) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. CONCLUSION: GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.
format Text
id pubmed-553964
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5539642005-03-11 GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA Beiko, Robert G Charlebois, Robert L BMC Bioinformatics Software BACKGROUND: The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. RESULTS: GANN (available at ) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. CONCLUSION: GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences. BioMed Central 2005-02-22 /pmc/articles/PMC553964/ /pubmed/15725347 http://dx.doi.org/10.1186/1471-2105-6-36 Text en Copyright © 2005 Beiko and Charlebois; licensee BioMed Central Ltd.
spellingShingle Software
Beiko, Robert G
Charlebois, Robert L
GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA
title GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA
title_full GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA
title_fullStr GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA
title_full_unstemmed GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA
title_short GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA
title_sort gann: genetic algorithm neural networks for the detection of conserved combinations of features in dna
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC553964/
https://www.ncbi.nlm.nih.gov/pubmed/15725347
http://dx.doi.org/10.1186/1471-2105-6-36
work_keys_str_mv AT beikorobertg ganngeneticalgorithmneuralnetworksforthedetectionofconservedcombinationsoffeaturesindna
AT charleboisrobertl ganngeneticalgorithmneuralnetworksforthedetectionofconservedcombinationsoffeaturesindna