Cargando…

3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families

Protein domain families are usually classified on the basis of similarity of amino acid sequences. Selection of a single representative sequence for each family provides targets for structure determination or modeling and also enables fast sequence searches to associate new members to a family. Such...

Descripción completa

Detalles Bibliográficos
Autores principales: Joseph, Agnel P., Shingate, Prashant, Upadhyay, Atul K., Sowdhamini, R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3974335/
https://www.ncbi.nlm.nih.gov/pubmed/24700812
http://dx.doi.org/10.1093/database/bau026
_version_ 1782479462072320000
author Joseph, Agnel P.
Shingate, Prashant
Upadhyay, Atul K.
Sowdhamini, R.
author_facet Joseph, Agnel P.
Shingate, Prashant
Upadhyay, Atul K.
Sowdhamini, R.
author_sort Joseph, Agnel P.
collection PubMed
description Protein domain families are usually classified on the basis of similarity of amino acid sequences. Selection of a single representative sequence for each family provides targets for structure determination or modeling and also enables fast sequence searches to associate new members to a family. Such a selection could be challenging since some of these domain families exhibit huge variation depending on the number of members in the family, the average family sequence length or the extent of sequence divergence within a family. We had earlier created 3PFDB database as a repository of best representative sequences, selected from each PFAM domain family on the basis of high coverage. In this study, we have improved the database using more efficient strategies for the initial generation of sequence profiles and implement two independent methods, FASSM and HMMER, for identifying family members. HMMER employs a global sequence similarity search, while FASSM relies on motif identification and matching. This improved and updated database, 3PFDB+ generated in this study, provides representative sequences and profiles for PFAM families, with 13 519 family representatives having more than 90% family coverage. The representative sequence is also highlighted in a two-dimensional plot, which reflects the relative divergence between family members. Representatives belonging to small families with short sequences are mainly associated with low coverage. The set of sequences not recognized by the family representative profiles, highlight several potential false or weak family associations in PFAM. Partial domains and fragments dominate such cases, along with sequences that are highly diverged or different from other family members. Some of these outliers were also predicted to have different secondary structure contents, which reflect different putative structure or functional roles for these domain sequences. Database URL: http://caps.ncbs.res.in/3pfdbplus/
format Online
Article
Text
id pubmed-3974335
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39743352014-04-04 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families Joseph, Agnel P. Shingate, Prashant Upadhyay, Atul K. Sowdhamini, R. Database (Oxford) Database Update Protein domain families are usually classified on the basis of similarity of amino acid sequences. Selection of a single representative sequence for each family provides targets for structure determination or modeling and also enables fast sequence searches to associate new members to a family. Such a selection could be challenging since some of these domain families exhibit huge variation depending on the number of members in the family, the average family sequence length or the extent of sequence divergence within a family. We had earlier created 3PFDB database as a repository of best representative sequences, selected from each PFAM domain family on the basis of high coverage. In this study, we have improved the database using more efficient strategies for the initial generation of sequence profiles and implement two independent methods, FASSM and HMMER, for identifying family members. HMMER employs a global sequence similarity search, while FASSM relies on motif identification and matching. This improved and updated database, 3PFDB+ generated in this study, provides representative sequences and profiles for PFAM families, with 13 519 family representatives having more than 90% family coverage. The representative sequence is also highlighted in a two-dimensional plot, which reflects the relative divergence between family members. Representatives belonging to small families with short sequences are mainly associated with low coverage. The set of sequences not recognized by the family representative profiles, highlight several potential false or weak family associations in PFAM. Partial domains and fragments dominate such cases, along with sequences that are highly diverged or different from other family members. Some of these outliers were also predicted to have different secondary structure contents, which reflect different putative structure or functional roles for these domain sequences. Database URL: http://caps.ncbs.res.in/3pfdbplus/ Oxford University Press 2014-04-03 /pmc/articles/PMC3974335/ /pubmed/24700812 http://dx.doi.org/10.1093/database/bau026 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Update
Joseph, Agnel P.
Shingate, Prashant
Upadhyay, Atul K.
Sowdhamini, R.
3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
title 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
title_full 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
title_fullStr 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
title_full_unstemmed 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
title_short 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
title_sort 3pfdb+: improved search protocol and update for the identification of representatives of protein sequence domain families
topic Database Update
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3974335/
https://www.ncbi.nlm.nih.gov/pubmed/24700812
http://dx.doi.org/10.1093/database/bau026
work_keys_str_mv AT josephagnelp 3pfdbimprovedsearchprotocolandupdatefortheidentificationofrepresentativesofproteinsequencedomainfamilies
AT shingateprashant 3pfdbimprovedsearchprotocolandupdatefortheidentificationofrepresentativesofproteinsequencedomainfamilies
AT upadhyayatulk 3pfdbimprovedsearchprotocolandupdatefortheidentificationofrepresentativesofproteinsequencedomainfamilies
AT sowdhaminir 3pfdbimprovedsearchprotocolandupdatefortheidentificationofrepresentativesofproteinsequencedomainfamilies