Cargando…
Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery
We present Microseek, a pipeline for virus identification and discovery based on RVDB-prot, a comprehensive, curated and regularly updated database of viral proteins. Microseek analyzes metagenomic Next Generation Sequencing (mNGS) raw data by performing quality steps, de novo assembly, and by scori...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9500916/ https://www.ncbi.nlm.nih.gov/pubmed/36146797 http://dx.doi.org/10.3390/v14091990 |
_version_ | 1784795340888604672 |
---|---|
author | Pérot, Philippe Bigot, Thomas Temmam, Sarah Regnault, Béatrice Eloit, Marc |
author_facet | Pérot, Philippe Bigot, Thomas Temmam, Sarah Regnault, Béatrice Eloit, Marc |
author_sort | Pérot, Philippe |
collection | PubMed |
description | We present Microseek, a pipeline for virus identification and discovery based on RVDB-prot, a comprehensive, curated and regularly updated database of viral proteins. Microseek analyzes metagenomic Next Generation Sequencing (mNGS) raw data by performing quality steps, de novo assembly, and by scoring the Lowest Common Ancestor (LCA) from translated reads and contigs. Microseek runs on a local computer. The outcome of the pipeline is displayed through a user-friendly and dynamic graphical interface. Based on two representative mNGS datasets derived from human tissue and plasma specimens, we illustrate how Microseek works, and we report its performances. In silico spikes of known viral sequences, but also spikes of fake Neopneumovirus viral sequences generated with variable evolutionary distances from known members of the Pneumoviridae family, were used. Results were compared to Chan Zuckerberg ID (CZ ID), a reference cloud-based mNGS pipeline. We show that Microseek reliably identifies known viral sequences and performs well for the detection of distant pseudoviral sequences, especially in complex samples such as in human plasma, while minimizing non-relevant hits. |
format | Online Article Text |
id | pubmed-9500916 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95009162022-09-24 Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery Pérot, Philippe Bigot, Thomas Temmam, Sarah Regnault, Béatrice Eloit, Marc Viruses Article We present Microseek, a pipeline for virus identification and discovery based on RVDB-prot, a comprehensive, curated and regularly updated database of viral proteins. Microseek analyzes metagenomic Next Generation Sequencing (mNGS) raw data by performing quality steps, de novo assembly, and by scoring the Lowest Common Ancestor (LCA) from translated reads and contigs. Microseek runs on a local computer. The outcome of the pipeline is displayed through a user-friendly and dynamic graphical interface. Based on two representative mNGS datasets derived from human tissue and plasma specimens, we illustrate how Microseek works, and we report its performances. In silico spikes of known viral sequences, but also spikes of fake Neopneumovirus viral sequences generated with variable evolutionary distances from known members of the Pneumoviridae family, were used. Results were compared to Chan Zuckerberg ID (CZ ID), a reference cloud-based mNGS pipeline. We show that Microseek reliably identifies known viral sequences and performs well for the detection of distant pseudoviral sequences, especially in complex samples such as in human plasma, while minimizing non-relevant hits. MDPI 2022-09-08 /pmc/articles/PMC9500916/ /pubmed/36146797 http://dx.doi.org/10.3390/v14091990 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Pérot, Philippe Bigot, Thomas Temmam, Sarah Regnault, Béatrice Eloit, Marc Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery |
title | Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery |
title_full | Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery |
title_fullStr | Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery |
title_full_unstemmed | Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery |
title_short | Microseek: A Protein-Based Metagenomic Pipeline for Virus Diagnostic and Discovery |
title_sort | microseek: a protein-based metagenomic pipeline for virus diagnostic and discovery |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9500916/ https://www.ncbi.nlm.nih.gov/pubmed/36146797 http://dx.doi.org/10.3390/v14091990 |
work_keys_str_mv | AT perotphilippe microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery AT bigotthomas microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery AT temmamsarah microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery AT regnaultbeatrice microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery AT eloitmarc microseekaproteinbasedmetagenomicpipelineforvirusdiagnosticanddiscovery |