Cargando…

SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the simi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rattei, Thomas, Tischler, Patrick, Götz, Stefan, Jehl, Marc-André, Hoser, Jonathan, Arnold, Roland, Conesa, Ana, Mewes, Hans-Werner
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2010
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808863/ https://www.ncbi.nlm.nih.gov/pubmed/19906725 http://dx.doi.org/10.1093/nar/gkp949

_version_	1782176540155445248
author	Rattei, Thomas Tischler, Patrick Götz, Stefan Jehl, Marc-André Hoser, Jonathan Arnold, Roland Conesa, Ana Mewes, Hans-Werner
author_facet	Rattei, Thomas Tischler, Patrick Götz, Stefan Jehl, Marc-André Hoser, Jonathan Arnold, Roland Conesa, Ana Mewes, Hans-Werner
author_sort	Rattei, Thomas
collection	PubMed
description	The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).
format	Text
id	pubmed-2808863
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-28088632010-01-20 SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters Rattei, Thomas Tischler, Patrick Götz, Stefan Jehl, Marc-André Hoser, Jonathan Arnold, Roland Conesa, Ana Mewes, Hans-Werner Nucleic Acids Res Articles The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl). Oxford University Press 2010-01 2009-11-11 /pmc/articles/PMC2808863/ /pubmed/19906725 http://dx.doi.org/10.1093/nar/gkp949 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Articles Rattei, Thomas Tischler, Patrick Götz, Stefan Jehl, Marc-André Hoser, Jonathan Arnold, Roland Conesa, Ana Mewes, Hans-Werner SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
title	SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
title_full	SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
title_fullStr	SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
title_full_unstemmed	SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
title_short	SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
title_sort	simap—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808863/ https://www.ncbi.nlm.nih.gov/pubmed/19906725 http://dx.doi.org/10.1093/nar/gkp949
work_keys_str_mv	AT ratteithomas simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT tischlerpatrick simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT gotzstefan simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT jehlmarcandre simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT hoserjonathan simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT arnoldroland simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT conesaana simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters AT meweshanswerner simapacomprehensivedatabaseofprecalculatedproteinsequencesimilaritiesdomainsannotationsandclusters

SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters

Ejemplares similares