Cargando…

A benchmark study of sequence alignment methods for protein clustering

BACKGROUND: Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Yingying, Wu, Hongyan, Cai, Yunpeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311937/ https://www.ncbi.nlm.nih.gov/pubmed/30598070 http://dx.doi.org/10.1186/s12859-018-2524-4

_version_	1783383705607208960
author	Wang, Yingying Wu, Hongyan Cai, Yunpeng
author_facet	Wang, Yingying Wu, Hongyan Cai, Yunpeng
author_sort	Wang, Yingying
collection	PubMed
description	BACKGROUND: Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleotide sequence alignments. To test whether similar drawbacks also influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. Compared with former studies, we calculate the cluster validity score based on sequence distances instead of clustering results. This strategy could avoid the influence brought by different clustering methods thus make results more dependable. RESULTS: Results showed that PSA methods performed better than MSA methods on most of the BAliBASE benchmark datasets. Analyses on the 80 re-sampled benchmark datasets constructed by randomly choosing 90% of each dataset 10 times showed similar results. CONCLUSIONS: These results validated that the drawbacks of MSA methods revealed in nucleotide level also existed in protein sequence alignment analyses and affect the accuracy of results. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2524-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6311937
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63119372019-01-07 A benchmark study of sequence alignment methods for protein clustering Wang, Yingying Wu, Hongyan Cai, Yunpeng BMC Bioinformatics Research BACKGROUND: Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleotide sequence alignments. To test whether similar drawbacks also influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. Compared with former studies, we calculate the cluster validity score based on sequence distances instead of clustering results. This strategy could avoid the influence brought by different clustering methods thus make results more dependable. RESULTS: Results showed that PSA methods performed better than MSA methods on most of the BAliBASE benchmark datasets. Analyses on the 80 re-sampled benchmark datasets constructed by randomly choosing 90% of each dataset 10 times showed similar results. CONCLUSIONS: These results validated that the drawbacks of MSA methods revealed in nucleotide level also existed in protein sequence alignment analyses and affect the accuracy of results. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2524-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-31 /pmc/articles/PMC6311937/ /pubmed/30598070 http://dx.doi.org/10.1186/s12859-018-2524-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Wang, Yingying Wu, Hongyan Cai, Yunpeng A benchmark study of sequence alignment methods for protein clustering
title	A benchmark study of sequence alignment methods for protein clustering
title_full	A benchmark study of sequence alignment methods for protein clustering
title_fullStr	A benchmark study of sequence alignment methods for protein clustering
title_full_unstemmed	A benchmark study of sequence alignment methods for protein clustering
title_short	A benchmark study of sequence alignment methods for protein clustering
title_sort	benchmark study of sequence alignment methods for protein clustering
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311937/ https://www.ncbi.nlm.nih.gov/pubmed/30598070 http://dx.doi.org/10.1186/s12859-018-2524-4
work_keys_str_mv	AT wangyingying abenchmarkstudyofsequencealignmentmethodsforproteinclustering AT wuhongyan abenchmarkstudyofsequencealignmentmethodsforproteinclustering AT caiyunpeng abenchmarkstudyofsequencealignmentmethodsforproteinclustering AT wangyingying benchmarkstudyofsequencealignmentmethodsforproteinclustering AT wuhongyan benchmarkstudyofsequencealignmentmethodsforproteinclustering AT caiyunpeng benchmarkstudyofsequencealignmentmethodsforproteinclustering

A benchmark study of sequence alignment methods for protein clustering

Ejemplares similares