Cargando…
Application of Subspace Clustering in DNA Sequence Analysis
Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an applicatio...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Mary Ann Liebert, Inc.
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589114/ https://www.ncbi.nlm.nih.gov/pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084 |
_version_ | 1782392747827658752 |
---|---|
author | Wallace, Tim Sekmen, Ali Wang, Xiaofei |
author_facet | Wallace, Tim Sekmen, Ali Wang, Xiaofei |
author_sort | Wallace, Tim |
collection | PubMed |
description | Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. |
format | Online Article Text |
id | pubmed-4589114 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Mary Ann Liebert, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-45891142015-10-06 Application of Subspace Clustering in DNA Sequence Analysis Wallace, Tim Sekmen, Ali Wang, Xiaofei J Comput Biol Research Articles Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. Mary Ann Liebert, Inc. 2015-10-01 /pmc/articles/PMC4589114/ /pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084 Text en © The Author(s) 2015; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Research Articles Wallace, Tim Sekmen, Ali Wang, Xiaofei Application of Subspace Clustering in DNA Sequence Analysis |
title | Application of Subspace Clustering in DNA Sequence Analysis |
title_full | Application of Subspace Clustering in DNA Sequence Analysis |
title_fullStr | Application of Subspace Clustering in DNA Sequence Analysis |
title_full_unstemmed | Application of Subspace Clustering in DNA Sequence Analysis |
title_short | Application of Subspace Clustering in DNA Sequence Analysis |
title_sort | application of subspace clustering in dna sequence analysis |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589114/ https://www.ncbi.nlm.nih.gov/pubmed/26162018 http://dx.doi.org/10.1089/cmb.2015.0084 |
work_keys_str_mv | AT wallacetim applicationofsubspaceclusteringindnasequenceanalysis AT sekmenali applicationofsubspaceclusteringindnasequenceanalysis AT wangxiaofei applicationofsubspaceclusteringindnasequenceanalysis |