Cargando…
Multidimensional Scaling Applied to Histogram-Based DNA Analysis
This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statist...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3418642/ https://www.ncbi.nlm.nih.gov/pubmed/22919286 http://dx.doi.org/10.1155/2012/289694 |
_version_ | 1782240656815554560 |
---|---|
author | Costa, António C. Tenreiro Machado, J. A. Quelhas, Maria Dulce |
author_facet | Costa, António C. Tenreiro Machado, J. A. Quelhas, Maria Dulce |
author_sort | Costa, António C. |
collection | PubMed |
description | This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method's artifacts. |
format | Online Article Text |
id | pubmed-3418642 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-34186422012-08-23 Multidimensional Scaling Applied to Histogram-Based DNA Analysis Costa, António C. Tenreiro Machado, J. A. Quelhas, Maria Dulce Comp Funct Genomics Research Article This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method's artifacts. Hindawi Publishing Corporation 2012 2012-07-24 /pmc/articles/PMC3418642/ /pubmed/22919286 http://dx.doi.org/10.1155/2012/289694 Text en Copyright © 2012 António C. Costa et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Costa, António C. Tenreiro Machado, J. A. Quelhas, Maria Dulce Multidimensional Scaling Applied to Histogram-Based DNA Analysis |
title | Multidimensional Scaling Applied to Histogram-Based DNA Analysis |
title_full | Multidimensional Scaling Applied to Histogram-Based DNA Analysis |
title_fullStr | Multidimensional Scaling Applied to Histogram-Based DNA Analysis |
title_full_unstemmed | Multidimensional Scaling Applied to Histogram-Based DNA Analysis |
title_short | Multidimensional Scaling Applied to Histogram-Based DNA Analysis |
title_sort | multidimensional scaling applied to histogram-based dna analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3418642/ https://www.ncbi.nlm.nih.gov/pubmed/22919286 http://dx.doi.org/10.1155/2012/289694 |
work_keys_str_mv | AT costaantonioc multidimensionalscalingappliedtohistogrambaseddnaanalysis AT tenreiromachadoja multidimensionalscalingappliedtohistogrambaseddnaanalysis AT quelhasmariadulce multidimensionalscalingappliedtohistogrambaseddnaanalysis |