Cargando…

GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data

This work reports the development of GenSeed-HMM, a program that implements seed-driven progressive assembly, an approach to reconstruct specific sequences from unassembled data, starting from short nucleotide or protein seed sequences or profile Hidden Markov Models (HMM). The program can use any o...

Descripción completa

Detalles Bibliográficos
Autores principales: Alves, João M. P., de Oliveira, André L., Sandberg, Tatiana O. M., Moreno-Gallego, Jaime L., de Toledo, Marcelo A. F., de Moura, Elisabeth M. M., Oliveira, Liliane S., Durham, Alan M., Mehnert, Dolores U., Zanotto, Paolo M. de A., Reyes, Alejandro, Gruber, Arthur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4777721/
https://www.ncbi.nlm.nih.gov/pubmed/26973638
http://dx.doi.org/10.3389/fmicb.2016.00269
_version_ 1782419340057903104
author Alves, João M. P.
de Oliveira, André L.
Sandberg, Tatiana O. M.
Moreno-Gallego, Jaime L.
de Toledo, Marcelo A. F.
de Moura, Elisabeth M. M.
Oliveira, Liliane S.
Durham, Alan M.
Mehnert, Dolores U.
Zanotto, Paolo M. de A.
Reyes, Alejandro
Gruber, Arthur
author_facet Alves, João M. P.
de Oliveira, André L.
Sandberg, Tatiana O. M.
Moreno-Gallego, Jaime L.
de Toledo, Marcelo A. F.
de Moura, Elisabeth M. M.
Oliveira, Liliane S.
Durham, Alan M.
Mehnert, Dolores U.
Zanotto, Paolo M. de A.
Reyes, Alejandro
Gruber, Arthur
author_sort Alves, João M. P.
collection PubMed
description This work reports the development of GenSeed-HMM, a program that implements seed-driven progressive assembly, an approach to reconstruct specific sequences from unassembled data, starting from short nucleotide or protein seed sequences or profile Hidden Markov Models (HMM). The program can use any one of a number of sequence assemblers. Assembly is performed in multiple steps and relatively few reads are used in each cycle, consequently the program demands low computational resources. As a proof-of-concept and to demonstrate the power of HMM-driven progressive assemblies, GenSeed-HMM was applied to metagenomic datasets in the search for diverse ssDNA bacteriophages from the recently described Alpavirinae subfamily. Profile HMMs were built using Alpavirinae-specific regions from multiple sequence alignments (MSA) using either the viral protein 1 (VP1; major capsid protein) or VP4 (genome replication initiation protein). These profile HMMs were used by GenSeed-HMM (running Newbler assembler) as seeds to reconstruct viral genomes from sequencing datasets of human fecal samples. All contigs obtained were annotated and taxonomically classified using similarity searches and phylogenetic analyses. The most specific profile HMM seed enabled the reconstruction of 45 partial or complete Alpavirinae genomic sequences. A comparison with conventional (global) assembly of the same original dataset, using Newbler in a standalone execution, revealed that GenSeed-HMM outperformed global genomic assembly in several metrics employed. This approach is capable of detecting organisms that have not been used in the construction of the profile HMM, which opens up the possibility of diagnosing novel viruses, without previous specific information, constituting a de novo diagnosis. Additional applications include, but are not limited to, the specific assembly of extrachromosomal elements such as plastid and mitochondrial genomes from metagenomic data. Profile HMM seeds can also be used to reconstruct specific protein coding genes for gene diversity studies, and to determine all possible gene variants present in a metagenomic sample. Such surveys could be useful to detect the emergence of drug-resistance variants in sensitive environments such as hospitals and animal production facilities, where antibiotics are regularly used. Finally, GenSeed-HMM can be used as an adjunct for gap closure on assembly finishing projects, by using multiple contig ends as anchored seeds.
format Online
Article
Text
id pubmed-4777721
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-47777212016-03-11 GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data Alves, João M. P. de Oliveira, André L. Sandberg, Tatiana O. M. Moreno-Gallego, Jaime L. de Toledo, Marcelo A. F. de Moura, Elisabeth M. M. Oliveira, Liliane S. Durham, Alan M. Mehnert, Dolores U. Zanotto, Paolo M. de A. Reyes, Alejandro Gruber, Arthur Front Microbiol Microbiology This work reports the development of GenSeed-HMM, a program that implements seed-driven progressive assembly, an approach to reconstruct specific sequences from unassembled data, starting from short nucleotide or protein seed sequences or profile Hidden Markov Models (HMM). The program can use any one of a number of sequence assemblers. Assembly is performed in multiple steps and relatively few reads are used in each cycle, consequently the program demands low computational resources. As a proof-of-concept and to demonstrate the power of HMM-driven progressive assemblies, GenSeed-HMM was applied to metagenomic datasets in the search for diverse ssDNA bacteriophages from the recently described Alpavirinae subfamily. Profile HMMs were built using Alpavirinae-specific regions from multiple sequence alignments (MSA) using either the viral protein 1 (VP1; major capsid protein) or VP4 (genome replication initiation protein). These profile HMMs were used by GenSeed-HMM (running Newbler assembler) as seeds to reconstruct viral genomes from sequencing datasets of human fecal samples. All contigs obtained were annotated and taxonomically classified using similarity searches and phylogenetic analyses. The most specific profile HMM seed enabled the reconstruction of 45 partial or complete Alpavirinae genomic sequences. A comparison with conventional (global) assembly of the same original dataset, using Newbler in a standalone execution, revealed that GenSeed-HMM outperformed global genomic assembly in several metrics employed. This approach is capable of detecting organisms that have not been used in the construction of the profile HMM, which opens up the possibility of diagnosing novel viruses, without previous specific information, constituting a de novo diagnosis. Additional applications include, but are not limited to, the specific assembly of extrachromosomal elements such as plastid and mitochondrial genomes from metagenomic data. Profile HMM seeds can also be used to reconstruct specific protein coding genes for gene diversity studies, and to determine all possible gene variants present in a metagenomic sample. Such surveys could be useful to detect the emergence of drug-resistance variants in sensitive environments such as hospitals and animal production facilities, where antibiotics are regularly used. Finally, GenSeed-HMM can be used as an adjunct for gap closure on assembly finishing projects, by using multiple contig ends as anchored seeds. Frontiers Media S.A. 2016-03-04 /pmc/articles/PMC4777721/ /pubmed/26973638 http://dx.doi.org/10.3389/fmicb.2016.00269 Text en Copyright © 2016 Alves, de Oliveira, Sandberg, Moreno-Gallego, de Toledo, de Moura, Oliveira, Durham, Mehnert, Zanotto, Reyes and Gruber. http://creativecommons.org/licenses/by/4.0/ This is an open-access This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Alves, João M. P.
de Oliveira, André L.
Sandberg, Tatiana O. M.
Moreno-Gallego, Jaime L.
de Toledo, Marcelo A. F.
de Moura, Elisabeth M. M.
Oliveira, Liliane S.
Durham, Alan M.
Mehnert, Dolores U.
Zanotto, Paolo M. de A.
Reyes, Alejandro
Gruber, Arthur
GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
title GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
title_full GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
title_fullStr GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
title_full_unstemmed GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
title_short GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data
title_sort genseed-hmm: a tool for progressive assembly using profile hmms as seeds and its application in alpavirinae viral discovery from metagenomic data
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4777721/
https://www.ncbi.nlm.nih.gov/pubmed/26973638
http://dx.doi.org/10.3389/fmicb.2016.00269
work_keys_str_mv AT alvesjoaomp genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT deoliveiraandrel genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT sandbergtatianaom genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT morenogallegojaimel genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT detoledomarceloaf genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT demouraelisabethmm genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT oliveiralilianes genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT durhamalanm genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT mehnertdoloresu genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT zanottopaolomdea genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT reyesalejandro genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata
AT gruberarthur genseedhmmatoolforprogressiveassemblyusingprofilehmmsasseedsanditsapplicationinalpavirinaeviraldiscoveryfrommetagenomicdata