Cargando…

The Pfam protein families database: towards a more sustainable future

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB...

Descripción completa

Detalles Bibliográficos
Autores principales: Finn, Robert D., Coggill, Penelope, Eberhardt, Ruth Y., Eddy, Sean R., Mistry, Jaina, Mitchell, Alex L., Potter, Simon C., Punta, Marco, Qureshi, Matloob, Sangrador-Vegas, Amaia, Salazar, Gustavo A., Tate, John, Bateman, Alex
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702930/
https://www.ncbi.nlm.nih.gov/pubmed/26673716
http://dx.doi.org/10.1093/nar/gkv1344
_version_ 1782408682970021888
author Finn, Robert D.
Coggill, Penelope
Eberhardt, Ruth Y.
Eddy, Sean R.
Mistry, Jaina
Mitchell, Alex L.
Potter, Simon C.
Punta, Marco
Qureshi, Matloob
Sangrador-Vegas, Amaia
Salazar, Gustavo A.
Tate, John
Bateman, Alex
author_facet Finn, Robert D.
Coggill, Penelope
Eberhardt, Ruth Y.
Eddy, Sean R.
Mistry, Jaina
Mitchell, Alex L.
Potter, Simon C.
Punta, Marco
Qureshi, Matloob
Sangrador-Vegas, Amaia
Salazar, Gustavo A.
Tate, John
Bateman, Alex
author_sort Finn, Robert D.
collection PubMed
description In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
format Online
Article
Text
id pubmed-4702930
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47029302016-01-07 The Pfam protein families database: towards a more sustainable future Finn, Robert D. Coggill, Penelope Eberhardt, Ruth Y. Eddy, Sean R. Mistry, Jaina Mitchell, Alex L. Potter, Simon C. Punta, Marco Qureshi, Matloob Sangrador-Vegas, Amaia Salazar, Gustavo A. Tate, John Bateman, Alex Nucleic Acids Res Database Issue In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. Oxford University Press 2016-01-04 2015-12-15 /pmc/articles/PMC4702930/ /pubmed/26673716 http://dx.doi.org/10.1093/nar/gkv1344 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Database Issue
Finn, Robert D.
Coggill, Penelope
Eberhardt, Ruth Y.
Eddy, Sean R.
Mistry, Jaina
Mitchell, Alex L.
Potter, Simon C.
Punta, Marco
Qureshi, Matloob
Sangrador-Vegas, Amaia
Salazar, Gustavo A.
Tate, John
Bateman, Alex
The Pfam protein families database: towards a more sustainable future
title The Pfam protein families database: towards a more sustainable future
title_full The Pfam protein families database: towards a more sustainable future
title_fullStr The Pfam protein families database: towards a more sustainable future
title_full_unstemmed The Pfam protein families database: towards a more sustainable future
title_short The Pfam protein families database: towards a more sustainable future
title_sort pfam protein families database: towards a more sustainable future
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702930/
https://www.ncbi.nlm.nih.gov/pubmed/26673716
http://dx.doi.org/10.1093/nar/gkv1344
work_keys_str_mv AT finnrobertd thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT coggillpenelope thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT eberhardtruthy thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT eddyseanr thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT mistryjaina thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT mitchellalexl thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT pottersimonc thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT puntamarco thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT qureshimatloob thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT sangradorvegasamaia thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT salazargustavoa thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT tatejohn thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT batemanalex thepfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT finnrobertd pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT coggillpenelope pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT eberhardtruthy pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT eddyseanr pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT mistryjaina pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT mitchellalexl pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT pottersimonc pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT puntamarco pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT qureshimatloob pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT sangradorvegasamaia pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT salazargustavoa pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT tatejohn pfamproteinfamiliesdatabasetowardsamoresustainablefuture
AT batemanalex pfamproteinfamiliesdatabasetowardsamoresustainablefuture