Cargando…

Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recogn...

Descripción completa

Detalles Bibliográficos
Autores principales: Maulik, Ujjwal, Sarkar, Anasua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574063/
https://www.ncbi.nlm.nih.gov/pubmed/23457439
http://dx.doi.org/10.1371/journal.pone.0046468
_version_ 1782259559791853568
author Maulik, Ujjwal
Sarkar, Anasua
author_facet Maulik, Ujjwal
Sarkar, Anasua
author_sort Maulik, Ujjwal
collection PubMed
description Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr.
format Online
Article
Text
id pubmed-3574063
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35740632013-03-01 Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels Maulik, Ujjwal Sarkar, Anasua PLoS One Research Article Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. Public Library of Science 2013-02-15 /pmc/articles/PMC3574063/ /pubmed/23457439 http://dx.doi.org/10.1371/journal.pone.0046468 Text en © 2013 Maulik, Sarkar http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Maulik, Ujjwal
Sarkar, Anasua
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
title Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
title_full Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
title_fullStr Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
title_full_unstemmed Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
title_short Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
title_sort searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574063/
https://www.ncbi.nlm.nih.gov/pubmed/23457439
http://dx.doi.org/10.1371/journal.pone.0046468
work_keys_str_mv AT maulikujjwal searchingremotehomologywithspectralclusteringwithsymmetryinneighborhoodclusterkernels
AT sarkaranasua searchingremotehomologywithspectralclusteringwithsymmetryinneighborhoodclusterkernels