Cargando…

Application of Subspace Clustering in DNA Sequence Analysis

Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an applicatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wallace, Tim, Sekmen, Ali, Wang, Xiaofei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Mary Ann Liebert, Inc. 2015
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589114/ https://www.ncbi.nlm.nih.gov/pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084

_version_	1782392747827658752
author	Wallace, Tim Sekmen, Ali Wang, Xiaofei
author_facet	Wallace, Tim Sekmen, Ali Wang, Xiaofei
author_sort	Wallace, Tim
collection	PubMed
description	Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis.
format	Online Article Text
id	pubmed-4589114
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Mary Ann Liebert, Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-45891142015-10-06 Application of Subspace Clustering in DNA Sequence Analysis Wallace, Tim Sekmen, Ali Wang, Xiaofei J Comput Biol Research Articles Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. Mary Ann Liebert, Inc. 2015-10-01 /pmc/articles/PMC4589114/ /pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084 Text en © The Author(s) 2015; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Research Articles Wallace, Tim Sekmen, Ali Wang, Xiaofei Application of Subspace Clustering in DNA Sequence Analysis
title	Application of Subspace Clustering in DNA Sequence Analysis
title_full	Application of Subspace Clustering in DNA Sequence Analysis
title_fullStr	Application of Subspace Clustering in DNA Sequence Analysis
title_full_unstemmed	Application of Subspace Clustering in DNA Sequence Analysis
title_short	Application of Subspace Clustering in DNA Sequence Analysis
title_sort	application of subspace clustering in dna sequence analysis
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589114/ https://www.ncbi.nlm.nih.gov/pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084
work_keys_str_mv	AT wallacetim applicationofsubspaceclusteringindnasequenceanalysis AT sekmenali applicationofsubspaceclusteringindnasequenceanalysis AT wangxiaofei applicationofsubspaceclusteringindnasequenceanalysis

Application of Subspace Clustering in DNA Sequence Analysis

Ejemplares similares