Cargando…

A dictionary based informational genome analysis

BACKGROUND: In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological se...

Descripción completa

Detalles Bibliográficos
Autores principales: Castellini, Alberto, Franco, Giuditta, Manca, Vincenzo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577435/
https://www.ncbi.nlm.nih.gov/pubmed/22985068
http://dx.doi.org/10.1186/1471-2164-13-485
_version_ 1782259910223855616
author Castellini, Alberto
Franco, Giuditta
Manca, Vincenzo
author_facet Castellini, Alberto
Franco, Giuditta
Manca, Vincenzo
author_sort Castellini, Alberto
collection PubMed
description BACKGROUND: In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. RESULTS: Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. CONCLUSIONS: We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies.
format Online
Article
Text
id pubmed-3577435
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35774352013-02-26 A dictionary based informational genome analysis Castellini, Alberto Franco, Giuditta Manca, Vincenzo BMC Genomics Methodology Article BACKGROUND: In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. RESULTS: Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. CONCLUSIONS: We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. BioMed Central 2012-09-17 /pmc/articles/PMC3577435/ /pubmed/22985068 http://dx.doi.org/10.1186/1471-2164-13-485 Text en Copyright ©2012 Castellini et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Castellini, Alberto
Franco, Giuditta
Manca, Vincenzo
A dictionary based informational genome analysis
title A dictionary based informational genome analysis
title_full A dictionary based informational genome analysis
title_fullStr A dictionary based informational genome analysis
title_full_unstemmed A dictionary based informational genome analysis
title_short A dictionary based informational genome analysis
title_sort dictionary based informational genome analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577435/
https://www.ncbi.nlm.nih.gov/pubmed/22985068
http://dx.doi.org/10.1186/1471-2164-13-485
work_keys_str_mv AT castellinialberto adictionarybasedinformationalgenomeanalysis
AT francogiuditta adictionarybasedinformationalgenomeanalysis
AT mancavincenzo adictionarybasedinformationalgenomeanalysis
AT castellinialberto dictionarybasedinformationalgenomeanalysis
AT francogiuditta dictionarybasedinformationalgenomeanalysis
AT mancavincenzo dictionarybasedinformationalgenomeanalysis