Cargando…

Genome Data Exploration Using Correspondence Analysis

Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such...

Descripción completa

Detalles Bibliográficos
Autor principal:	Tekaia, Fredj
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Libertas Academica 2016
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4898644/ https://www.ncbi.nlm.nih.gov/pubmed/27279736 http://dx.doi.org/10.4137/BBI.S39614

_version_	1782436366458552320
author	Tekaia, Fredj
author_facet	Tekaia, Fredj
author_sort	Tekaia, Fredj
collection	PubMed
description	Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.
format	Online Article Text
id	pubmed-4898644
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-48986442016-06-08 Genome Data Exploration Using Correspondence Analysis Tekaia, Fredj Bioinform Biol Insights Review Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables. Libertas Academica 2016-06-07 /pmc/articles/PMC4898644/ /pubmed/27279736 http://dx.doi.org/10.4137/BBI.S39614 Text en © 2016 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle	Review Tekaia, Fredj Genome Data Exploration Using Correspondence Analysis
title	Genome Data Exploration Using Correspondence Analysis
title_full	Genome Data Exploration Using Correspondence Analysis
title_fullStr	Genome Data Exploration Using Correspondence Analysis
title_full_unstemmed	Genome Data Exploration Using Correspondence Analysis
title_short	Genome Data Exploration Using Correspondence Analysis
title_sort	genome data exploration using correspondence analysis
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4898644/ https://www.ncbi.nlm.nih.gov/pubmed/27279736 http://dx.doi.org/10.4137/BBI.S39614
work_keys_str_mv	AT tekaiafredj genomedataexplorationusingcorrespondenceanalysis

Genome Data Exploration Using Correspondence Analysis

Ejemplares similares