Cargando…

Application of Subspace Clustering in DNA Sequence Analysis

Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an applicatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Wallace, Tim, Sekmen, Ali, Wang, Xiaofei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589114/
https://www.ncbi.nlm.nih.gov/pubmed/26162018
http://dx.doi.org/10.1089/cmb.2015.0084
_version_ 1782392747827658752
author Wallace, Tim
Sekmen, Ali
Wang, Xiaofei
author_facet Wallace, Tim
Sekmen, Ali
Wang, Xiaofei
author_sort Wallace, Tim
collection PubMed
description Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis.
format Online
Article
Text
id pubmed-4589114
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-45891142015-10-06 Application of Subspace Clustering in DNA Sequence Analysis Wallace, Tim Sekmen, Ali Wang, Xiaofei J Comput Biol Research Articles Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. Mary Ann Liebert, Inc. 2015-10-01 /pmc/articles/PMC4589114/ /pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084 Text en © The Author(s) 2015; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Articles
Wallace, Tim
Sekmen, Ali
Wang, Xiaofei
Application of Subspace Clustering in DNA Sequence Analysis
title Application of Subspace Clustering in DNA Sequence Analysis
title_full Application of Subspace Clustering in DNA Sequence Analysis
title_fullStr Application of Subspace Clustering in DNA Sequence Analysis
title_full_unstemmed Application of Subspace Clustering in DNA Sequence Analysis
title_short Application of Subspace Clustering in DNA Sequence Analysis
title_sort application of subspace clustering in dna sequence analysis
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589114/
https://www.ncbi.nlm.nih.gov/pubmed/26162018
http://dx.doi.org/10.1089/cmb.2015.0084
work_keys_str_mv AT wallacetim applicationofsubspaceclusteringindnasequenceanalysis
AT sekmenali applicationofsubspaceclusteringindnasequenceanalysis
AT wangxiaofei applicationofsubspaceclusteringindnasequenceanalysis