Cargando…

3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach

BACKGROUND: Protein families could be related to each other at broad levels that group them as superfamilies. These relationships are harder to detect at the sequence level due to high evolutionary divergence. Sequence searches are strongly directed and influenced by the best representatives of fami...

Descripción completa

Detalles Bibliográficos
Autores principales: Shameer, Khader, Nagarajan, Paramasivam, Gaurav, Kumar, Sowdhamini, Ramanathan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2801675/
https://www.ncbi.nlm.nih.gov/pubmed/19961575
http://dx.doi.org/10.1186/1756-0381-2-8
_version_ 1782175950054621184
author Shameer, Khader
Nagarajan, Paramasivam
Gaurav, Kumar
Sowdhamini, Ramanathan
author_facet Shameer, Khader
Nagarajan, Paramasivam
Gaurav, Kumar
Sowdhamini, Ramanathan
author_sort Shameer, Khader
collection PubMed
description BACKGROUND: Protein families could be related to each other at broad levels that group them as superfamilies. These relationships are harder to detect at the sequence level due to high evolutionary divergence. Sequence searches are strongly directed and influenced by the best representatives of families that are viewed as starting points. PSSMs are useful approximations and mathematical representations of protein alignments, with wide array of applications in bioinformatics approaches like remote homology detection, protein family analysis, detection of new members and evolutionary modelling. Computational intensive searches have been performed using the neural network based sensitive sequence search method called FASSM to identify the Best Representative PSSMs for families reported in Pfam database version 22. RESULTS: We designed a novel data mining approach for the assessment of individual sequences from a protein family to identify a single Best Representative PSSM profile (BRP) per protein family. Using the approach, a database of protein family-specific best representative PSSM profiles called 3PFDB has been developed. PSSM profiles in 3PFDB are curated using performance of individual sequence as a reference in a rigorous scoring and coverage analysis approach using FASSM. We have assessed the suitability of 10, 85,588 sequences derived from seed or full alignments reported in Pfam database (Version 22). Coverage analysis using FASSM method is used as the filtering step to identify the best representative sequence, starting from full length or domain sequences to generate the final profile for a given family. 3PFDB is a collection of best representative PSSM profiles of 8,524 protein families from Pfam database. CONCLUSION: Availability of an approach to identify BRPs and a curated database of best representative PSI-BLAST derived PSSMs for 91.4% of current Pfam family will be a useful resource for the community to perform detailed and specific analysis using family-specific, best-representative PSSM profiles. 3PFDB can be accessed using the URL: http://caps.ncbs.res.in/3pfdb
format Text
id pubmed-2801675
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28016752010-01-05 3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach Shameer, Khader Nagarajan, Paramasivam Gaurav, Kumar Sowdhamini, Ramanathan BioData Min Research BACKGROUND: Protein families could be related to each other at broad levels that group them as superfamilies. These relationships are harder to detect at the sequence level due to high evolutionary divergence. Sequence searches are strongly directed and influenced by the best representatives of families that are viewed as starting points. PSSMs are useful approximations and mathematical representations of protein alignments, with wide array of applications in bioinformatics approaches like remote homology detection, protein family analysis, detection of new members and evolutionary modelling. Computational intensive searches have been performed using the neural network based sensitive sequence search method called FASSM to identify the Best Representative PSSMs for families reported in Pfam database version 22. RESULTS: We designed a novel data mining approach for the assessment of individual sequences from a protein family to identify a single Best Representative PSSM profile (BRP) per protein family. Using the approach, a database of protein family-specific best representative PSSM profiles called 3PFDB has been developed. PSSM profiles in 3PFDB are curated using performance of individual sequence as a reference in a rigorous scoring and coverage analysis approach using FASSM. We have assessed the suitability of 10, 85,588 sequences derived from seed or full alignments reported in Pfam database (Version 22). Coverage analysis using FASSM method is used as the filtering step to identify the best representative sequence, starting from full length or domain sequences to generate the final profile for a given family. 3PFDB is a collection of best representative PSSM profiles of 8,524 protein families from Pfam database. CONCLUSION: Availability of an approach to identify BRPs and a curated database of best representative PSI-BLAST derived PSSMs for 91.4% of current Pfam family will be a useful resource for the community to perform detailed and specific analysis using family-specific, best-representative PSSM profiles. 3PFDB can be accessed using the URL: http://caps.ncbs.res.in/3pfdb BioMed Central 2009-12-04 /pmc/articles/PMC2801675/ /pubmed/19961575 http://dx.doi.org/10.1186/1756-0381-2-8 Text en Copyright ©2009 Shameer et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Shameer, Khader
Nagarajan, Paramasivam
Gaurav, Kumar
Sowdhamini, Ramanathan
3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach
title 3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach
title_full 3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach
title_fullStr 3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach
title_full_unstemmed 3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach
title_short 3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach
title_sort 3pfdb - a database of best representative pssm profiles (brps) of protein families generated using a novel data mining approach
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2801675/
https://www.ncbi.nlm.nih.gov/pubmed/19961575
http://dx.doi.org/10.1186/1756-0381-2-8
work_keys_str_mv AT shameerkhader 3pfdbadatabaseofbestrepresentativepssmprofilesbrpsofproteinfamiliesgeneratedusinganoveldataminingapproach
AT nagarajanparamasivam 3pfdbadatabaseofbestrepresentativepssmprofilesbrpsofproteinfamiliesgeneratedusinganoveldataminingapproach
AT gauravkumar 3pfdbadatabaseofbestrepresentativepssmprofilesbrpsofproteinfamiliesgeneratedusinganoveldataminingapproach
AT sowdhaminiramanathan 3pfdbadatabaseofbestrepresentativepssmprofilesbrpsofproteinfamiliesgeneratedusinganoveldataminingapproach