Cargando…

SEQOPTICS: a protein sequence clustering system

BACKGROUND: Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yonghui, Reilly, Kevin D, Sprague, Alan P, Guan, Zhijie
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1780130/
https://www.ncbi.nlm.nih.gov/pubmed/17217502
http://dx.doi.org/10.1186/1471-2105-7-S4-S10
_version_ 1782131851776753664
author Chen, Yonghui
Reilly, Kevin D
Sprague, Alan P
Guan, Zhijie
author_facet Chen, Yonghui
Reilly, Kevin D
Sprague, Alan P
Guan, Zhijie
author_sort Chen, Yonghui
collection PubMed
description BACKGROUND: Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. RESULTS: In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well. CONCLUSION: The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements.
format Text
id pubmed-1780130
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17801302007-01-24 SEQOPTICS: a protein sequence clustering system Chen, Yonghui Reilly, Kevin D Sprague, Alan P Guan, Zhijie BMC Bioinformatics Research BACKGROUND: Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. RESULTS: In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well. CONCLUSION: The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements. BioMed Central 2006-12-12 /pmc/articles/PMC1780130/ /pubmed/17217502 http://dx.doi.org/10.1186/1471-2105-7-S4-S10 Text en Copyright © 2006 Chen et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chen, Yonghui
Reilly, Kevin D
Sprague, Alan P
Guan, Zhijie
SEQOPTICS: a protein sequence clustering system
title SEQOPTICS: a protein sequence clustering system
title_full SEQOPTICS: a protein sequence clustering system
title_fullStr SEQOPTICS: a protein sequence clustering system
title_full_unstemmed SEQOPTICS: a protein sequence clustering system
title_short SEQOPTICS: a protein sequence clustering system
title_sort seqoptics: a protein sequence clustering system
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1780130/
https://www.ncbi.nlm.nih.gov/pubmed/17217502
http://dx.doi.org/10.1186/1471-2105-7-S4-S10
work_keys_str_mv AT chenyonghui seqopticsaproteinsequenceclusteringsystem
AT reillykevind seqopticsaproteinsequenceclusteringsystem
AT spraguealanp seqopticsaproteinsequenceclusteringsystem
AT guanzhijie seqopticsaproteinsequenceclusteringsystem