Cargando…
Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families
INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and f...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6825341/ https://www.ncbi.nlm.nih.gov/pubmed/31676016 http://dx.doi.org/10.1186/s13059-019-1823-z |
_version_ | 1783464879809626112 |
---|---|
author | Wang, Yan Shi, Qiang Yang, Pengshuo Zhang, Chengxin Mortuza, S. M. Xue, Zhidong Ning, Kang Zhang, Yang |
author_facet | Wang, Yan Shi, Qiang Yang, Pengshuo Zhang, Chengxin Mortuza, S. M. Xue, Zhidong Ning, Kang Zhang, Yang |
author_sort | Wang, Yan |
collection | PubMed |
description | INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS: By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS: These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences. |
format | Online Article Text |
id | pubmed-6825341 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68253412019-11-07 Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families Wang, Yan Shi, Qiang Yang, Pengshuo Zhang, Chengxin Mortuza, S. M. Xue, Zhidong Ning, Kang Zhang, Yang Genome Biol Research INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS: By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS: These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences. BioMed Central 2019-11-01 /pmc/articles/PMC6825341/ /pubmed/31676016 http://dx.doi.org/10.1186/s13059-019-1823-z Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Yan Shi, Qiang Yang, Pengshuo Zhang, Chengxin Mortuza, S. M. Xue, Zhidong Ning, Kang Zhang, Yang Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
title | Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
title_full | Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
title_fullStr | Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
title_full_unstemmed | Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
title_short | Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
title_sort | fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6825341/ https://www.ncbi.nlm.nih.gov/pubmed/31676016 http://dx.doi.org/10.1186/s13059-019-1823-z |
work_keys_str_mv | AT wangyan fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT shiqiang fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT yangpengshuo fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT zhangchengxin fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT mortuzasm fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT xuezhidong fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT ningkang fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies AT zhangyang fuelingabinitiofoldingwithmarinemetagenomicsenablesstructureandfunctionpredictionsofnewproteinfamilies |