Cargando…

Multidimensional Scaling Applied to Histogram-Based DNA Analysis

This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statist...

Descripción completa

Detalles Bibliográficos
Autores principales: Costa, António C., Tenreiro Machado, J. A., Quelhas, Maria Dulce
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3418642/
https://www.ncbi.nlm.nih.gov/pubmed/22919286
http://dx.doi.org/10.1155/2012/289694
_version_ 1782240656815554560
author Costa, António C.
Tenreiro Machado, J. A.
Quelhas, Maria Dulce
author_facet Costa, António C.
Tenreiro Machado, J. A.
Quelhas, Maria Dulce
author_sort Costa, António C.
collection PubMed
description This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method's artifacts.
format Online
Article
Text
id pubmed-3418642
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-34186422012-08-23 Multidimensional Scaling Applied to Histogram-Based DNA Analysis Costa, António C. Tenreiro Machado, J. A. Quelhas, Maria Dulce Comp Funct Genomics Research Article This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method's artifacts. Hindawi Publishing Corporation 2012 2012-07-24 /pmc/articles/PMC3418642/ /pubmed/22919286 http://dx.doi.org/10.1155/2012/289694 Text en Copyright © 2012 António C. Costa et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Costa, António C.
Tenreiro Machado, J. A.
Quelhas, Maria Dulce
Multidimensional Scaling Applied to Histogram-Based DNA Analysis
title Multidimensional Scaling Applied to Histogram-Based DNA Analysis
title_full Multidimensional Scaling Applied to Histogram-Based DNA Analysis
title_fullStr Multidimensional Scaling Applied to Histogram-Based DNA Analysis
title_full_unstemmed Multidimensional Scaling Applied to Histogram-Based DNA Analysis
title_short Multidimensional Scaling Applied to Histogram-Based DNA Analysis
title_sort multidimensional scaling applied to histogram-based dna analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3418642/
https://www.ncbi.nlm.nih.gov/pubmed/22919286
http://dx.doi.org/10.1155/2012/289694
work_keys_str_mv AT costaantonioc multidimensionalscalingappliedtohistogrambaseddnaanalysis
AT tenreiromachadoja multidimensionalscalingappliedtohistogrambaseddnaanalysis
AT quelhasmariadulce multidimensionalscalingappliedtohistogrambaseddnaanalysis