Cargando…

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, whi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wren, Jonathan D, Johnson, David, Gruenwald, Le
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1637034/ https://www.ncbi.nlm.nih.gov/pubmed/16026599 http://dx.doi.org/10.1186/1471-2105-6-S2-S2

_version_	1782130781454336000
author	Wren, Jonathan D Johnson, David Gruenwald, Le
author_facet	Wren, Jonathan D Johnson, David Gruenwald, Le
author_sort	Wren, Jonathan D
collection	PubMed
description	There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands.
format	Text
id	pubmed-1637034
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16370342006-11-16 Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set Wren, Jonathan D Johnson, David Gruenwald, Le BMC Bioinformatics Proceedings There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands. BioMed Central 2005-07-15 /pmc/articles/PMC1637034/ /pubmed/16026599 http://dx.doi.org/10.1186/1471-2105-6-S2-S2 Text en Copyright © 2006 Wren et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Wren, Jonathan D Johnson, David Gruenwald, Le Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
title	Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
title_full	Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
title_fullStr	Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
title_full_unstemmed	Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
title_short	Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set
title_sort	automating genomic data mining via a sequence-based matrix format and associative rule set
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1637034/ https://www.ncbi.nlm.nih.gov/pubmed/16026599 http://dx.doi.org/10.1186/1471-2105-6-S2-S2
work_keys_str_mv	AT wrenjonathand automatinggenomicdataminingviaasequencebasedmatrixformatandassociativeruleset AT johnsondavid automatinggenomicdataminingviaasequencebasedmatrixformatandassociativeruleset AT gruenwaldle automatinggenomicdataminingviaasequencebasedmatrixformatandassociativeruleset

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Ejemplares similares