Cargando…

VARUS: sampling complementary RNA reads from the sequence read archive

BACKGROUND: Vast amounts of next generation sequencing RNA data has been deposited in archives, accompanying very diverse original studies. The data is readily available also for other purposes such as genome annotation or transcriptome assembly. However, selecting a subset of available experiments,...

Descripción completa

Detalles Bibliográficos
Autores principales: Stanke, Mario, Bruhn, Willy, Becker, Felix, Hoff, Katharina J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842140/
https://www.ncbi.nlm.nih.gov/pubmed/31703556
http://dx.doi.org/10.1186/s12859-019-3182-x
_version_ 1783467988281720832
author Stanke, Mario
Bruhn, Willy
Becker, Felix
Hoff, Katharina J.
author_facet Stanke, Mario
Bruhn, Willy
Becker, Felix
Hoff, Katharina J.
author_sort Stanke, Mario
collection PubMed
description BACKGROUND: Vast amounts of next generation sequencing RNA data has been deposited in archives, accompanying very diverse original studies. The data is readily available also for other purposes such as genome annotation or transcriptome assembly. However, selecting a subset of available experiments, sequencing runs and reads for this purpose is a nontrivial task and complicated by the inhomogeneity of the data. RESULTS: This article presents the software VARUS that selects, downloads and aligns reads from NCBI’s Sequence Read Archive, given only the species’ binomial name and genome. VARUS automatically chooses runs from among all archived runs to randomly select subsets of reads. The objective of its online algorithm is to cover a large number of transcripts adequately when network bandwidth and computing resources are limited. For most tested species VARUS achieved both a higher sensitivity and specificity with a lower number of downloaded reads than when runs were manually selected. At the example of twelve eukaryotic genomes, we show that RNA-Seq that was sampled with VARUS is well-suited for fully-automatic genome annotation with BRAKER. CONCLUSIONS: With VARUS, genome annotation can be automatized to the extent that not even the selection and quality control of RNA-Seq has to be done manually. This introduces the possibility to have fully automatized genome annotation loops over potentially many species without incurring a loss of accuracy over a manually supervised annotation process.
format Online
Article
Text
id pubmed-6842140
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68421402019-11-14 VARUS: sampling complementary RNA reads from the sequence read archive Stanke, Mario Bruhn, Willy Becker, Felix Hoff, Katharina J. BMC Bioinformatics Software BACKGROUND: Vast amounts of next generation sequencing RNA data has been deposited in archives, accompanying very diverse original studies. The data is readily available also for other purposes such as genome annotation or transcriptome assembly. However, selecting a subset of available experiments, sequencing runs and reads for this purpose is a nontrivial task and complicated by the inhomogeneity of the data. RESULTS: This article presents the software VARUS that selects, downloads and aligns reads from NCBI’s Sequence Read Archive, given only the species’ binomial name and genome. VARUS automatically chooses runs from among all archived runs to randomly select subsets of reads. The objective of its online algorithm is to cover a large number of transcripts adequately when network bandwidth and computing resources are limited. For most tested species VARUS achieved both a higher sensitivity and specificity with a lower number of downloaded reads than when runs were manually selected. At the example of twelve eukaryotic genomes, we show that RNA-Seq that was sampled with VARUS is well-suited for fully-automatic genome annotation with BRAKER. CONCLUSIONS: With VARUS, genome annotation can be automatized to the extent that not even the selection and quality control of RNA-Seq has to be done manually. This introduces the possibility to have fully automatized genome annotation loops over potentially many species without incurring a loss of accuracy over a manually supervised annotation process. BioMed Central 2019-11-08 /pmc/articles/PMC6842140/ /pubmed/31703556 http://dx.doi.org/10.1186/s12859-019-3182-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Stanke, Mario
Bruhn, Willy
Becker, Felix
Hoff, Katharina J.
VARUS: sampling complementary RNA reads from the sequence read archive
title VARUS: sampling complementary RNA reads from the sequence read archive
title_full VARUS: sampling complementary RNA reads from the sequence read archive
title_fullStr VARUS: sampling complementary RNA reads from the sequence read archive
title_full_unstemmed VARUS: sampling complementary RNA reads from the sequence read archive
title_short VARUS: sampling complementary RNA reads from the sequence read archive
title_sort varus: sampling complementary rna reads from the sequence read archive
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842140/
https://www.ncbi.nlm.nih.gov/pubmed/31703556
http://dx.doi.org/10.1186/s12859-019-3182-x
work_keys_str_mv AT stankemario varussamplingcomplementaryrnareadsfromthesequencereadarchive
AT bruhnwilly varussamplingcomplementaryrnareadsfromthesequencereadarchive
AT beckerfelix varussamplingcomplementaryrnareadsfromthesequencereadarchive
AT hoffkatharinaj varussamplingcomplementaryrnareadsfromthesequencereadarchive