Cargando…

Discovering viral genomes in human metagenomic data by predicting unknown protein families

Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack h...

Descripción completa

Detalles Bibliográficos
Autores principales: Barrientos-Somarribas, Mauricio, Messina, David N., Pou, Christian, Lysholm, Fredrik, Bjerkner, Annelie, Allander, Tobias, Andersson, Björn, Sonnhammer, Erik L. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5758519/
https://www.ncbi.nlm.nih.gov/pubmed/29311716
http://dx.doi.org/10.1038/s41598-017-18341-7
_version_ 1783291004385755136
author Barrientos-Somarribas, Mauricio
Messina, David N.
Pou, Christian
Lysholm, Fredrik
Bjerkner, Annelie
Allander, Tobias
Andersson, Björn
Sonnhammer, Erik L. L.
author_facet Barrientos-Somarribas, Mauricio
Messina, David N.
Pou, Christian
Lysholm, Fredrik
Bjerkner, Annelie
Allander, Tobias
Andersson, Björn
Sonnhammer, Erik L. L.
author_sort Barrientos-Somarribas, Mauricio
collection PubMed
description Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.
format Online
Article
Text
id pubmed-5758519
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-57585192018-01-10 Discovering viral genomes in human metagenomic data by predicting unknown protein families Barrientos-Somarribas, Mauricio Messina, David N. Pou, Christian Lysholm, Fredrik Bjerkner, Annelie Allander, Tobias Andersson, Björn Sonnhammer, Erik L. L. Sci Rep Article Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM. Nature Publishing Group UK 2018-01-08 /pmc/articles/PMC5758519/ /pubmed/29311716 http://dx.doi.org/10.1038/s41598-017-18341-7 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Barrientos-Somarribas, Mauricio
Messina, David N.
Pou, Christian
Lysholm, Fredrik
Bjerkner, Annelie
Allander, Tobias
Andersson, Björn
Sonnhammer, Erik L. L.
Discovering viral genomes in human metagenomic data by predicting unknown protein families
title Discovering viral genomes in human metagenomic data by predicting unknown protein families
title_full Discovering viral genomes in human metagenomic data by predicting unknown protein families
title_fullStr Discovering viral genomes in human metagenomic data by predicting unknown protein families
title_full_unstemmed Discovering viral genomes in human metagenomic data by predicting unknown protein families
title_short Discovering viral genomes in human metagenomic data by predicting unknown protein families
title_sort discovering viral genomes in human metagenomic data by predicting unknown protein families
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5758519/
https://www.ncbi.nlm.nih.gov/pubmed/29311716
http://dx.doi.org/10.1038/s41598-017-18341-7
work_keys_str_mv AT barrientossomarribasmauricio discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT messinadavidn discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT pouchristian discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT lysholmfredrik discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT bjerknerannelie discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT allandertobias discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT anderssonbjorn discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies
AT sonnhammererikll discoveringviralgenomesinhumanmetagenomicdatabypredictingunknownproteinfamilies