Cargando…

Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists

BACKGROUND: The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web sy...

Descripción completa

Detalles Bibliográficos
Autores principales: Masseroli, Marco, Bellistri, Elisa, Franceschini, Andrea, Pinciroli, Francesco
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1885843/
https://www.ncbi.nlm.nih.gov/pubmed/17430558
http://dx.doi.org/10.1186/1471-2105-8-S1-S14
_version_ 1782133654211788800
author Masseroli, Marco
Bellistri, Elisa
Franceschini, Andrea
Pinciroli, Francesco
author_facet Masseroli, Marco
Bellistri, Elisa
Franceschini, Andrea
Pinciroli, Francesco
author_sort Masseroli, Marco
collection PubMed
description BACKGROUND: The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining. RESULTS: Exploiting protein information in Pfam and InterPro databanks, we developed and added in GFINDer original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the Statistics Protein Families&Domains module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the Logistic Regression module allows identifying protein functional signatures that better explain the considered gene classification. CONCLUSION: Novel GFINDer modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes.
format Text
id pubmed-1885843
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18858432007-06-05 Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists Masseroli, Marco Bellistri, Elisa Franceschini, Andrea Pinciroli, Francesco BMC Bioinformatics Research BACKGROUND: The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining. RESULTS: Exploiting protein information in Pfam and InterPro databanks, we developed and added in GFINDer original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the Statistics Protein Families&Domains module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the Logistic Regression module allows identifying protein functional signatures that better explain the considered gene classification. CONCLUSION: Novel GFINDer modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes. BioMed Central 2007-03-08 /pmc/articles/PMC1885843/ /pubmed/17430558 http://dx.doi.org/10.1186/1471-2105-8-S1-S14 Text en Copyright © 2007 Masseroli et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Masseroli, Marco
Bellistri, Elisa
Franceschini, Andrea
Pinciroli, Francesco
Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
title Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
title_full Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
title_fullStr Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
title_full_unstemmed Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
title_short Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
title_sort statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1885843/
https://www.ncbi.nlm.nih.gov/pubmed/17430558
http://dx.doi.org/10.1186/1471-2105-8-S1-S14
work_keys_str_mv AT masserolimarco statisticalanalysisofgenomicproteinfamilyanddomaincontrolledannotationsforfunctionalinvestigationofclassifiedgenelists
AT bellistrielisa statisticalanalysisofgenomicproteinfamilyanddomaincontrolledannotationsforfunctionalinvestigationofclassifiedgenelists
AT franceschiniandrea statisticalanalysisofgenomicproteinfamilyanddomaincontrolledannotationsforfunctionalinvestigationofclassifiedgenelists
AT pincirolifrancesco statisticalanalysisofgenomicproteinfamilyanddomaincontrolledannotationsforfunctionalinvestigationofclassifiedgenelists