Cargando…

Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery

We present Microseek, a pipeline for virus identification and discovery based on RVDB-prot, a comprehensive, curated and regularly updated database of viral proteins. Microseek analyzes metagenomic Next Generation Sequencing (mNGS) raw data by performing quality steps, de novo assembly, and by scori...

Descripción completa

Detalles Bibliográficos
Autores principales: Pérot, Philippe, Bigot, Thomas, Temmam, Sarah, Regnault, Béatrice, Eloit, Marc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9500916/
https://www.ncbi.nlm.nih.gov/pubmed/36146797
http://dx.doi.org/10.3390/v14091990
_version_ 1784795340888604672
author Pérot, Philippe
Bigot, Thomas
Temmam, Sarah
Regnault, Béatrice
Eloit, Marc
author_facet Pérot, Philippe
Bigot, Thomas
Temmam, Sarah
Regnault, Béatrice
Eloit, Marc
author_sort Pérot, Philippe
collection PubMed
description We present Microseek, a pipeline for virus identification and discovery based on RVDB-prot, a comprehensive, curated and regularly updated database of viral proteins. Microseek analyzes metagenomic Next Generation Sequencing (mNGS) raw data by performing quality steps, de novo assembly, and by scoring the Lowest Common Ancestor (LCA) from translated reads and contigs. Microseek runs on a local computer. The outcome of the pipeline is displayed through a user-friendly and dynamic graphical interface. Based on two representative mNGS datasets derived from human tissue and plasma specimens, we illustrate how Microseek works, and we report its performances. In silico spikes of known viral sequences, but also spikes of fake Neopneumovirus viral sequences generated with variable evolutionary distances from known members of the Pneumoviridae family, were used. Results were compared to Chan Zuckerberg ID (CZ ID), a reference cloud-based mNGS pipeline. We show that Microseek reliably identifies known viral sequences and performs well for the detection of distant pseudoviral sequences, especially in complex samples such as in human plasma, while minimizing non-relevant hits.
format Online
Article
Text
id pubmed-9500916
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95009162022-09-24 Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery Pérot, Philippe Bigot, Thomas Temmam, Sarah Regnault, Béatrice Eloit, Marc Viruses Article We present Microseek, a pipeline for virus identification and discovery based on RVDB-prot, a comprehensive, curated and regularly updated database of viral proteins. Microseek analyzes metagenomic Next Generation Sequencing (mNGS) raw data by performing quality steps, de novo assembly, and by scoring the Lowest Common Ancestor (LCA) from translated reads and contigs. Microseek runs on a local computer. The outcome of the pipeline is displayed through a user-friendly and dynamic graphical interface. Based on two representative mNGS datasets derived from human tissue and plasma specimens, we illustrate how Microseek works, and we report its performances. In silico spikes of known viral sequences, but also spikes of fake Neopneumovirus viral sequences generated with variable evolutionary distances from known members of the Pneumoviridae family, were used. Results were compared to Chan Zuckerberg ID (CZ ID), a reference cloud-based mNGS pipeline. We show that Microseek reliably identifies known viral sequences and performs well for the detection of distant pseudoviral sequences, especially in complex samples such as in human plasma, while minimizing non-relevant hits. MDPI 2022-09-08 /pmc/articles/PMC9500916/ /pubmed/36146797 http://dx.doi.org/10.3390/v14091990 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pérot, Philippe
Bigot, Thomas
Temmam, Sarah
Regnault, Béatrice
Eloit, Marc
Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
title Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
title_full Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
title_fullStr Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
title_full_unstemmed Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
title_short Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
title_sort microseek: a protein-based metagenomic pipeline for virus diagnostic and discovery
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9500916/
https://www.ncbi.nlm.nih.gov/pubmed/36146797
http://dx.doi.org/10.3390/v14091990
work_keys_str_mv AT perotphilippe microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery
AT bigotthomas microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery
AT temmamsarah microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery
AT regnaultbeatrice microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery
AT eloitmarc microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery