Cargando…

TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations

SUMMARY: TreeGrafter is a new software tool for annotating protein sequences using pre-annotated phylogenetic trees. Currently, the tool provides annotations to Gene Ontology (GO) terms, and PANTHER family and subfamily. The approach is generalizable to any annotations that have been made to interna...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Haiming, Finn, Robert D, Thomas, Paul D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6361231/
https://www.ncbi.nlm.nih.gov/pubmed/30032202
http://dx.doi.org/10.1093/bioinformatics/bty625
_version_ 1783392653373603840
author Tang, Haiming
Finn, Robert D
Thomas, Paul D
author_facet Tang, Haiming
Finn, Robert D
Thomas, Paul D
author_sort Tang, Haiming
collection PubMed
description SUMMARY: TreeGrafter is a new software tool for annotating protein sequences using pre-annotated phylogenetic trees. Currently, the tool provides annotations to Gene Ontology (GO) terms, and PANTHER family and subfamily. The approach is generalizable to any annotations that have been made to internal nodes of a reference phylogenetic tree. TreeGrafter takes each input query protein sequence, finds the best matching homologous family in a library of pre-calculated, pre-annotated gene trees, and then grafts it to the best location in the tree. It then annotates the sequence by propagating annotations from ancestral nodes in the reference tree. We show that TreeGrafter outperforms subfamily HMM scoring for correctly assigning subfamily membership, and that it produces highly specific annotations of GO terms based on annotated reference phylogenetic trees. This method will be further integrated into InterProScan, enabling an even broader user community. AVAILABILITY AND IMPLEMENTATION: TreeGrafter is freely available on the web at https://github.com/pantherdb/TreeGrafter, including as a Docker image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6361231
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63612312019-02-08 TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations Tang, Haiming Finn, Robert D Thomas, Paul D Bioinformatics Applications Notes SUMMARY: TreeGrafter is a new software tool for annotating protein sequences using pre-annotated phylogenetic trees. Currently, the tool provides annotations to Gene Ontology (GO) terms, and PANTHER family and subfamily. The approach is generalizable to any annotations that have been made to internal nodes of a reference phylogenetic tree. TreeGrafter takes each input query protein sequence, finds the best matching homologous family in a library of pre-calculated, pre-annotated gene trees, and then grafts it to the best location in the tree. It then annotates the sequence by propagating annotations from ancestral nodes in the reference tree. We show that TreeGrafter outperforms subfamily HMM scoring for correctly assigning subfamily membership, and that it produces highly specific annotations of GO terms based on annotated reference phylogenetic trees. This method will be further integrated into InterProScan, enabling an even broader user community. AVAILABILITY AND IMPLEMENTATION: TreeGrafter is freely available on the web at https://github.com/pantherdb/TreeGrafter, including as a Docker image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-02-01 2018-07-19 /pmc/articles/PMC6361231/ /pubmed/30032202 http://dx.doi.org/10.1093/bioinformatics/bty625 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Tang, Haiming
Finn, Robert D
Thomas, Paul D
TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
title TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
title_full TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
title_fullStr TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
title_full_unstemmed TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
title_short TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations
title_sort treegrafter: phylogenetic tree-based annotation of proteins with gene ontology terms and other annotations
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6361231/
https://www.ncbi.nlm.nih.gov/pubmed/30032202
http://dx.doi.org/10.1093/bioinformatics/bty625
work_keys_str_mv AT tanghaiming treegrafterphylogenetictreebasedannotationofproteinswithgeneontologytermsandotherannotations
AT finnrobertd treegrafterphylogenetictreebasedannotationofproteinswithgeneontologytermsandotherannotations
AT thomaspauld treegrafterphylogenetictreebasedannotationofproteinswithgeneontologytermsandotherannotations