Cargando…

GFam: a platform for automatic annotation of gene families

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation ac...

Descripción completa

Detalles Bibliográficos
Autores principales: Sasidharan, Rajkumar, Nepusz, Tamás, Swarbreck, David, Huala, Eva, Paccanaro, Alberto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3479161/
https://www.ncbi.nlm.nih.gov/pubmed/22790981
http://dx.doi.org/10.1093/nar/gks631
_version_ 1782247416148262912
author Sasidharan, Rajkumar
Nepusz, Tamás
Swarbreck, David
Huala, Eva
Paccanaro, Alberto
author_facet Sasidharan, Rajkumar
Nepusz, Tamás
Swarbreck, David
Huala, Eva
Paccanaro, Alberto
author_sort Sasidharan, Rajkumar
collection PubMed
description We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
format Online
Article
Text
id pubmed-3479161
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34791612012-10-24 GFam: a platform for automatic annotation of gene families Sasidharan, Rajkumar Nepusz, Tamás Swarbreck, David Huala, Eva Paccanaro, Alberto Nucleic Acids Res Methods Online We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. Oxford University Press 2012-10 2012-07-11 /pmc/articles/PMC3479161/ /pubmed/22790981 http://dx.doi.org/10.1093/nar/gks631 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Sasidharan, Rajkumar
Nepusz, Tamás
Swarbreck, David
Huala, Eva
Paccanaro, Alberto
GFam: a platform for automatic annotation of gene families
title GFam: a platform for automatic annotation of gene families
title_full GFam: a platform for automatic annotation of gene families
title_fullStr GFam: a platform for automatic annotation of gene families
title_full_unstemmed GFam: a platform for automatic annotation of gene families
title_short GFam: a platform for automatic annotation of gene families
title_sort gfam: a platform for automatic annotation of gene families
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3479161/
https://www.ncbi.nlm.nih.gov/pubmed/22790981
http://dx.doi.org/10.1093/nar/gks631
work_keys_str_mv AT sasidharanrajkumar gfamaplatformforautomaticannotationofgenefamilies
AT nepusztamas gfamaplatformforautomaticannotationofgenefamilies
AT swarbreckdavid gfamaplatformforautomaticannotationofgenefamilies
AT hualaeva gfamaplatformforautomaticannotationofgenefamilies
AT paccanaroalberto gfamaplatformforautomaticannotationofgenefamilies