Cargando…

SRAssembler: Selective Recursive local Assembly of homologous genomic regions

BACKGROUND: The falling cost of next-generation sequencing technology has allowed deep sequencing across related species and of individuals within species. Whole genome assemblies from these data remain high time- and resource-consuming computational tasks, particularly if best solutions are sought...

Descripción completa

Detalles Bibliográficos
Autores principales: McCarthy, Thomas W., Chou, Hsien-chao, Brendel, Volker P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604332/
https://www.ncbi.nlm.nih.gov/pubmed/31266441
http://dx.doi.org/10.1186/s12859-019-2949-4
_version_ 1783431690164633600
author McCarthy, Thomas W.
Chou, Hsien-chao
Brendel, Volker P.
author_facet McCarthy, Thomas W.
Chou, Hsien-chao
Brendel, Volker P.
author_sort McCarthy, Thomas W.
collection PubMed
description BACKGROUND: The falling cost of next-generation sequencing technology has allowed deep sequencing across related species and of individuals within species. Whole genome assemblies from these data remain high time- and resource-consuming computational tasks, particularly if best solutions are sought using different assembly strategies and parameter sets. However, in many cases, the underlying research questions are not genome-wide but rather target specific genes or sets of genes. We describe a novel assembly tool, SRAssembler, that efficiently assembles only contigs containing potential homologs of a gene or protein query, thus enabling gene-specific genome studies over large numbers of short read samples. RESULTS: We demonstrate the functionality of SRAssembler with examples largely drawn from plant genomics. The workflow implements a recursive strategy by which relevant reads are successively pulled from the input sets based on overlapping significant matches, resulting in virtual chromosome walking. The typical workflow behavior is illustrated with assembly of simulated reads. Applications to real data show that SRAssembler produces homologous contigs of equivalent quality to whole genome assemblies. Settings can be chosen to not only assemble presumed orthologs but also paralogous gene loci in distinct contigs. A key application is assembly of the same locus in many individuals from population genome data, which provides assessment of structural variation beyond what can be inferred from read mapping to a reference genome alone. SRAssembler can be used on modest computing resources or used in parallel on high performance computing clusters (most easily by invoking a dedicated Singularity image). CONCLUSIONS: SRAssembler offers an efficient tool to complement whole genome assembly software. It can be used to solve gene-specific research questions based on large genomic read samples from multiple sources and would be an expedient choice when whole genome assembly from the reads is either not feasible, too costly, or unnecessary. The program can also aid decision making on the depth of sequencing in an ongoing novel genome sequencing project or with respect to ultimate whole genome assembly strategies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2949-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6604332
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66043322019-07-12 SRAssembler: Selective Recursive local Assembly of homologous genomic regions McCarthy, Thomas W. Chou, Hsien-chao Brendel, Volker P. BMC Bioinformatics Software BACKGROUND: The falling cost of next-generation sequencing technology has allowed deep sequencing across related species and of individuals within species. Whole genome assemblies from these data remain high time- and resource-consuming computational tasks, particularly if best solutions are sought using different assembly strategies and parameter sets. However, in many cases, the underlying research questions are not genome-wide but rather target specific genes or sets of genes. We describe a novel assembly tool, SRAssembler, that efficiently assembles only contigs containing potential homologs of a gene or protein query, thus enabling gene-specific genome studies over large numbers of short read samples. RESULTS: We demonstrate the functionality of SRAssembler with examples largely drawn from plant genomics. The workflow implements a recursive strategy by which relevant reads are successively pulled from the input sets based on overlapping significant matches, resulting in virtual chromosome walking. The typical workflow behavior is illustrated with assembly of simulated reads. Applications to real data show that SRAssembler produces homologous contigs of equivalent quality to whole genome assemblies. Settings can be chosen to not only assemble presumed orthologs but also paralogous gene loci in distinct contigs. A key application is assembly of the same locus in many individuals from population genome data, which provides assessment of structural variation beyond what can be inferred from read mapping to a reference genome alone. SRAssembler can be used on modest computing resources or used in parallel on high performance computing clusters (most easily by invoking a dedicated Singularity image). CONCLUSIONS: SRAssembler offers an efficient tool to complement whole genome assembly software. It can be used to solve gene-specific research questions based on large genomic read samples from multiple sources and would be an expedient choice when whole genome assembly from the reads is either not feasible, too costly, or unnecessary. The program can also aid decision making on the depth of sequencing in an ongoing novel genome sequencing project or with respect to ultimate whole genome assembly strategies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2949-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-02 /pmc/articles/PMC6604332/ /pubmed/31266441 http://dx.doi.org/10.1186/s12859-019-2949-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
McCarthy, Thomas W.
Chou, Hsien-chao
Brendel, Volker P.
SRAssembler: Selective Recursive local Assembly of homologous genomic regions
title SRAssembler: Selective Recursive local Assembly of homologous genomic regions
title_full SRAssembler: Selective Recursive local Assembly of homologous genomic regions
title_fullStr SRAssembler: Selective Recursive local Assembly of homologous genomic regions
title_full_unstemmed SRAssembler: Selective Recursive local Assembly of homologous genomic regions
title_short SRAssembler: Selective Recursive local Assembly of homologous genomic regions
title_sort srassembler: selective recursive local assembly of homologous genomic regions
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604332/
https://www.ncbi.nlm.nih.gov/pubmed/31266441
http://dx.doi.org/10.1186/s12859-019-2949-4
work_keys_str_mv AT mccarthythomasw srassemblerselectiverecursivelocalassemblyofhomologousgenomicregions
AT chouhsienchao srassemblerselectiverecursivelocalassemblyofhomologousgenomicregions
AT brendelvolkerp srassemblerselectiverecursivelocalassemblyofhomologousgenomicregions