Cargando…

ITEP: An integrated toolkit for exploration of microbial pan-genomes

BACKGROUND: Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for...

Descripción completa

Detalles Bibliográficos
Autores principales: Benedict, Matthew N, Henriksen, James R, Metcalf, William W, Whitaker, Rachel J, Price, Nathan D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3890548/
https://www.ncbi.nlm.nih.gov/pubmed/24387194
http://dx.doi.org/10.1186/1471-2164-15-8
_version_ 1782299272448835584
author Benedict, Matthew N
Henriksen, James R
Metcalf, William W
Whitaker, Rachel J
Price, Nathan D
author_facet Benedict, Matthew N
Henriksen, James R
Metcalf, William W
Whitaker, Rachel J
Price, Nathan D
author_sort Benedict, Matthew N
collection PubMed
description BACKGROUND: Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. RESULTS: We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP’s capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. CONCLUSIONS: ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
format Online
Article
Text
id pubmed-3890548
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38905482014-01-15 ITEP: An integrated toolkit for exploration of microbial pan-genomes Benedict, Matthew N Henriksen, James R Metcalf, William W Whitaker, Rachel J Price, Nathan D BMC Genomics Software BACKGROUND: Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. RESULTS: We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP’s capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. CONCLUSIONS: ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts. BioMed Central 2014-01-03 /pmc/articles/PMC3890548/ /pubmed/24387194 http://dx.doi.org/10.1186/1471-2164-15-8 Text en Copyright © 2014 Benedict et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Benedict, Matthew N
Henriksen, James R
Metcalf, William W
Whitaker, Rachel J
Price, Nathan D
ITEP: An integrated toolkit for exploration of microbial pan-genomes
title ITEP: An integrated toolkit for exploration of microbial pan-genomes
title_full ITEP: An integrated toolkit for exploration of microbial pan-genomes
title_fullStr ITEP: An integrated toolkit for exploration of microbial pan-genomes
title_full_unstemmed ITEP: An integrated toolkit for exploration of microbial pan-genomes
title_short ITEP: An integrated toolkit for exploration of microbial pan-genomes
title_sort itep: an integrated toolkit for exploration of microbial pan-genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3890548/
https://www.ncbi.nlm.nih.gov/pubmed/24387194
http://dx.doi.org/10.1186/1471-2164-15-8
work_keys_str_mv AT benedictmatthewn itepanintegratedtoolkitforexplorationofmicrobialpangenomes
AT henriksenjamesr itepanintegratedtoolkitforexplorationofmicrobialpangenomes
AT metcalfwilliamw itepanintegratedtoolkitforexplorationofmicrobialpangenomes
AT whitakerrachelj itepanintegratedtoolkitforexplorationofmicrobialpangenomes
AT pricenathand itepanintegratedtoolkitforexplorationofmicrobialpangenomes