Cargando…

Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families

INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and f...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yan, Shi, Qiang, Yang, Pengshuo, Zhang, Chengxin, Mortuza, S. M., Xue, Zhidong, Ning, Kang, Zhang, Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6825341/
https://www.ncbi.nlm.nih.gov/pubmed/31676016
http://dx.doi.org/10.1186/s13059-019-1823-z
_version_ 1783464879809626112
author Wang, Yan
Shi, Qiang
Yang, Pengshuo
Zhang, Chengxin
Mortuza, S. M.
Xue, Zhidong
Ning, Kang
Zhang, Yang
author_facet Wang, Yan
Shi, Qiang
Yang, Pengshuo
Zhang, Chengxin
Mortuza, S. M.
Xue, Zhidong
Ning, Kang
Zhang, Yang
author_sort Wang, Yan
collection PubMed
description INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS: By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS: These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.
format Online
Article
Text
id pubmed-6825341
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68253412019-11-07 Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families Wang, Yan Shi, Qiang Yang, Pengshuo Zhang, Chengxin Mortuza, S. M. Xue, Zhidong Ning, Kang Zhang, Yang Genome Biol Research INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS: By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS: These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences. BioMed Central 2019-11-01 /pmc/articles/PMC6825341/ /pubmed/31676016 http://dx.doi.org/10.1186/s13059-019-1823-z Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Yan
Shi, Qiang
Yang, Pengshuo
Zhang, Chengxin
Mortuza, S. M.
Xue, Zhidong
Ning, Kang
Zhang, Yang
Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
title Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
title_full Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
title_fullStr Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
title_full_unstemmed Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
title_short Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
title_sort fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6825341/
https://www.ncbi.nlm.nih.gov/pubmed/31676016
http://dx.doi.org/10.1186/s13059-019-1823-z
work_keys_str_mv AT wangyan fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT shiqiang fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT yangpengshuo fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT zhangchengxin fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT mortuzasm fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT xuezhidong fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT ningkang fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies
AT zhangyang fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies