Cargando…

ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs

BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especi...

Descripción completa

Detalles Bibliográficos
Autores principales: Dvorkina, Tatiana, Bankevich, Anton, Sorokin, Alexei, Yang, Fan, Adu-Oppong, Boahemaa, Williams, Ryan, Turner, Keith, Pevzner, Pavel A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240309/
https://www.ncbi.nlm.nih.gov/pubmed/34183047
http://dx.doi.org/10.1186/s40168-021-01092-z
_version_ 1783715188775583744
author Dvorkina, Tatiana
Bankevich, Anton
Sorokin, Alexei
Yang, Fan
Adu-Oppong, Boahemaa
Williams, Ryan
Turner, Keith
Pevzner, Pavel A.
author_facet Dvorkina, Tatiana
Bankevich, Anton
Sorokin, Alexei
Yang, Fan
Adu-Oppong, Boahemaa
Williams, Ryan
Turner, Keith
Pevzner, Pavel A.
author_sort Dvorkina, Tatiana
collection PubMed
description BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics. METHODS: Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG. RESULTS: We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes. CONCLUSIONS: We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes “hidden” in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-021-01092-z.
format Online
Article
Text
id pubmed-8240309
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82403092021-06-30 ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs Dvorkina, Tatiana Bankevich, Anton Sorokin, Alexei Yang, Fan Adu-Oppong, Boahemaa Williams, Ryan Turner, Keith Pevzner, Pavel A. Microbiome Research BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics. METHODS: Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG. RESULTS: We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes. CONCLUSIONS: We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes “hidden” in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-021-01092-z. BioMed Central 2021-06-28 /pmc/articles/PMC8240309/ /pubmed/34183047 http://dx.doi.org/10.1186/s40168-021-01092-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dvorkina, Tatiana
Bankevich, Anton
Sorokin, Alexei
Yang, Fan
Adu-Oppong, Boahemaa
Williams, Ryan
Turner, Keith
Pevzner, Pavel A.
ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
title ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
title_full ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
title_fullStr ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
title_full_unstemmed ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
title_short ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
title_sort orfograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240309/
https://www.ncbi.nlm.nih.gov/pubmed/34183047
http://dx.doi.org/10.1186/s40168-021-01092-z
work_keys_str_mv AT dvorkinatatiana orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT bankevichanton orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT sorokinalexei orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT yangfan orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT aduoppongboahemaa orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT williamsryan orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT turnerkeith orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs
AT pevznerpavela orfographsearchfornovelinsecticidalproteingenesingenomicandmetagenomicassemblygraphs