Cargando…

Systematic identification and analysis of frequent gene fusion events in metabolic pathways

BACKGROUND: Gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availabili...

Descripción completa

Detalles Bibliográficos
Autores principales: Henry, Christopher S., Lerma-Ortiz, Claudia, Gerdes, Svetlana Y., Mullen, Jeffrey D., Colasanti, Ric, Zhukov, Aleksey, Frelin, Océane, Thiaville, Jennifer J., Zallot, Rémi, Niehaus, Thomas D., Hasnain, Ghulam, Conrad, Neal, Hanson, Andrew D., de Crécy-Lagard, Valérie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4921024/
https://www.ncbi.nlm.nih.gov/pubmed/27342196
http://dx.doi.org/10.1186/s12864-016-2782-3
_version_ 1782439466360635392
author Henry, Christopher S.
Lerma-Ortiz, Claudia
Gerdes, Svetlana Y.
Mullen, Jeffrey D.
Colasanti, Ric
Zhukov, Aleksey
Frelin, Océane
Thiaville, Jennifer J.
Zallot, Rémi
Niehaus, Thomas D.
Hasnain, Ghulam
Conrad, Neal
Hanson, Andrew D.
de Crécy-Lagard, Valérie
author_facet Henry, Christopher S.
Lerma-Ortiz, Claudia
Gerdes, Svetlana Y.
Mullen, Jeffrey D.
Colasanti, Ric
Zhukov, Aleksey
Frelin, Océane
Thiaville, Jennifer J.
Zallot, Rémi
Niehaus, Thomas D.
Hasnain, Ghulam
Conrad, Neal
Hanson, Andrew D.
de Crécy-Lagard, Valérie
author_sort Henry, Christopher S.
collection PubMed
description BACKGROUND: Gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. RESULTS: Here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These sets were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database at http://modelseed.org/projects/fusions/. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. CONCLUSIONS: Customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2782-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4921024
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49210242016-06-26 Systematic identification and analysis of frequent gene fusion events in metabolic pathways Henry, Christopher S. Lerma-Ortiz, Claudia Gerdes, Svetlana Y. Mullen, Jeffrey D. Colasanti, Ric Zhukov, Aleksey Frelin, Océane Thiaville, Jennifer J. Zallot, Rémi Niehaus, Thomas D. Hasnain, Ghulam Conrad, Neal Hanson, Andrew D. de Crécy-Lagard, Valérie BMC Genomics Research Article BACKGROUND: Gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. RESULTS: Here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These sets were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database at http://modelseed.org/projects/fusions/. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. CONCLUSIONS: Customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2782-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-24 /pmc/articles/PMC4921024/ /pubmed/27342196 http://dx.doi.org/10.1186/s12864-016-2782-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Henry, Christopher S.
Lerma-Ortiz, Claudia
Gerdes, Svetlana Y.
Mullen, Jeffrey D.
Colasanti, Ric
Zhukov, Aleksey
Frelin, Océane
Thiaville, Jennifer J.
Zallot, Rémi
Niehaus, Thomas D.
Hasnain, Ghulam
Conrad, Neal
Hanson, Andrew D.
de Crécy-Lagard, Valérie
Systematic identification and analysis of frequent gene fusion events in metabolic pathways
title Systematic identification and analysis of frequent gene fusion events in metabolic pathways
title_full Systematic identification and analysis of frequent gene fusion events in metabolic pathways
title_fullStr Systematic identification and analysis of frequent gene fusion events in metabolic pathways
title_full_unstemmed Systematic identification and analysis of frequent gene fusion events in metabolic pathways
title_short Systematic identification and analysis of frequent gene fusion events in metabolic pathways
title_sort systematic identification and analysis of frequent gene fusion events in metabolic pathways
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4921024/
https://www.ncbi.nlm.nih.gov/pubmed/27342196
http://dx.doi.org/10.1186/s12864-016-2782-3
work_keys_str_mv AT henrychristophers systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT lermaortizclaudia systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT gerdessvetlanay systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT mullenjeffreyd systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT colasantiric systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT zhukovaleksey systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT frelinoceane systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT thiavillejenniferj systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT zallotremi systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT niehausthomasd systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT hasnainghulam systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT conradneal systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT hansonandrewd systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways
AT decrecylagardvalerie systematicidentificationandanalysisoffrequentgenefusioneventsinmetabolicpathways