Cargando…

Protein sequence redundancy reduction: comparison of various method

Non-redundant protein datasets are of utmost importance in bioinformatics. Constructing such datasets means removing protein sequences that overreach certain similarity thresholds. Several programs such as ‘Decrease redundancy’, ‘cd-hit’, ‘Pisces’, ‘BlastClust’ and ‘SkipRedundant’ are available. The...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sikic, Kresimir, Carugo, Oliviero
Formato:	Texto
Lenguaje:	English
Publicado:	Biomedical Informatics 2010
Materias:	Hypothesis
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3055704/ https://www.ncbi.nlm.nih.gov/pubmed/21364823

_version_	1782200133900828672
author	Sikic, Kresimir Carugo, Oliviero
author_facet	Sikic, Kresimir Carugo, Oliviero
author_sort	Sikic, Kresimir
collection	PubMed
description	Non-redundant protein datasets are of utmost importance in bioinformatics. Constructing such datasets means removing protein sequences that overreach certain similarity thresholds. Several programs such as ‘Decrease redundancy’, ‘cd-hit’, ‘Pisces’, ‘BlastClust’ and ‘SkipRedundant’ are available. The issue that we focus on here is to what extent the non-redundant datasets produced by different programs are similar to each other. A systematic comparison of the features and of the outputs of these programs, by using subsets of the UniProt database, was performed and is described here. The results show high level of overlap between non-redundant datasets obtained with the same program fed with the same initial dataset but different percentage of identity threshold, and moderate levels of similarity between results obtained with different programs fed with the same initial dataset and the same percentage of identity threshold. We must be aware that some differences may arise and the use of more than one computer application is advisable.
format	Text
id	pubmed-3055704
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Biomedical Informatics
record_format	MEDLINE/PubMed
spelling	pubmed-30557042011-05-03 Protein sequence redundancy reduction: comparison of various method Sikic, Kresimir Carugo, Oliviero Bioinformation Hypothesis Non-redundant protein datasets are of utmost importance in bioinformatics. Constructing such datasets means removing protein sequences that overreach certain similarity thresholds. Several programs such as ‘Decrease redundancy’, ‘cd-hit’, ‘Pisces’, ‘BlastClust’ and ‘SkipRedundant’ are available. The issue that we focus on here is to what extent the non-redundant datasets produced by different programs are similar to each other. A systematic comparison of the features and of the outputs of these programs, by using subsets of the UniProt database, was performed and is described here. The results show high level of overlap between non-redundant datasets obtained with the same program fed with the same initial dataset but different percentage of identity threshold, and moderate levels of similarity between results obtained with different programs fed with the same initial dataset and the same percentage of identity threshold. We must be aware that some differences may arise and the use of more than one computer application is advisable. Biomedical Informatics 2010-11-27 /pmc/articles/PMC3055704/ /pubmed/21364823 Text en © 2010 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle	Hypothesis Sikic, Kresimir Carugo, Oliviero Protein sequence redundancy reduction: comparison of various method
title	Protein sequence redundancy reduction: comparison of various method
title_full	Protein sequence redundancy reduction: comparison of various method
title_fullStr	Protein sequence redundancy reduction: comparison of various method
title_full_unstemmed	Protein sequence redundancy reduction: comparison of various method
title_short	Protein sequence redundancy reduction: comparison of various method
title_sort	protein sequence redundancy reduction: comparison of various method
topic	Hypothesis
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3055704/ https://www.ncbi.nlm.nih.gov/pubmed/21364823
work_keys_str_mv	AT sikickresimir proteinsequenceredundancyreductioncomparisonofvariousmethod AT carugooliviero proteinsequenceredundancyreductioncomparisonofvariousmethod

Protein sequence redundancy reduction: comparison of various method

Ejemplares similares