Cargando…

ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples

Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values resu...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghosh, Tarini Shankar, Mohammed, Monzoorul Haque, Komanduri, Dinakar, Mande, Sharmila Shekhar
Formato: Texto
Lenguaje:English
Publicado: Biomedical Informatics 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3082859/
https://www.ncbi.nlm.nih.gov/pubmed/21544173
_version_ 1782202341167988736
author Ghosh, Tarini Shankar
Mohammed, Monzoorul Haque
Komanduri, Dinakar
Mande, Sharmila Shekhar
author_facet Ghosh, Tarini Shankar
Mohammed, Monzoorul Haque
Komanduri, Dinakar
Mande, Sharmila Shekhar
author_sort Ghosh, Tarini Shankar
collection PubMed
description Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of ‘correct’ assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/
format Text
id pubmed-3082859
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-30828592011-05-04 ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples Ghosh, Tarini Shankar Mohammed, Monzoorul Haque Komanduri, Dinakar Mande, Sharmila Shekhar Bioinformation Software Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of ‘correct’ assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/ Biomedical Informatics 2011-03-26 /pmc/articles/PMC3082859/ /pubmed/21544173 Text en © 2011 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Software
Ghosh, Tarini Shankar
Mohammed, Monzoorul Haque
Komanduri, Dinakar
Mande, Sharmila Shekhar
ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples
title ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples
title_full ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples
title_fullStr ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples
title_full_unstemmed ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples
title_short ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples
title_sort provide: a software tool for accurate estimation of viral diversity in metagenomic samples
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3082859/
https://www.ncbi.nlm.nih.gov/pubmed/21544173
work_keys_str_mv AT ghoshtarinishankar provideasoftwaretoolforaccurateestimationofviraldiversityinmetagenomicsamples
AT mohammedmonzoorulhaque provideasoftwaretoolforaccurateestimationofviraldiversityinmetagenomicsamples
AT komanduridinakar provideasoftwaretoolforaccurateestimationofviraldiversityinmetagenomicsamples
AT mandesharmilashekhar provideasoftwaretoolforaccurateestimationofviraldiversityinmetagenomicsamples