Cargando…

SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data

BACKGROUND: While next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data. As an effort to fill this gap to keep up with the fast pace of technological advanceme...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Yan, Maxwell, Andrew S, Barker, Natalie D, Laird, Jennifer G, Kennedy, Alan J, Wang, Nan, Zhang, Chaoyang, Gong, Ping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4251038/
https://www.ncbi.nlm.nih.gov/pubmed/25349885
http://dx.doi.org/10.1186/1471-2105-15-S11-S10
_version_ 1782346992169517056
author Peng, Yan
Maxwell, Andrew S
Barker, Natalie D
Laird, Jennifer G
Kennedy, Alan J
Wang, Nan
Zhang, Chaoyang
Gong, Ping
author_facet Peng, Yan
Maxwell, Andrew S
Barker, Natalie D
Laird, Jennifer G
Kennedy, Alan J
Wang, Nan
Zhang, Chaoyang
Gong, Ping
author_sort Peng, Yan
collection PubMed
description BACKGROUND: While next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data. As an effort to fill this gap to keep up with the fast pace of technological advancement and to accelerate data-to-results turnaround, we developed a novel software package named SeqAssist ("Sequencing Assistant" or SA). RESULTS: SeqAssist takes NGS-generated FASTQ files as the input, employs the BWA-MEM aligner for sequence alignment, and aims to provide a quick overview and basic statistics of NGS data. It consists of three separate workflows: (1) the SA_RunStats workflow generates basic statistics about an NGS dataset, including numbers of raw, cleaned, redundant and unique reads, redundancy rate, and a list of unique sequences with length and read count; (2) the SA_Run2Ref workflow estimates the breadth, depth and evenness of genome-wide coverage of the NGS dataset at a nucleotide resolution; and (3) the SA_Run2Run workflow compares two NGS datasets to determine the redundancy (overlapping rate) between the two NGS runs. Statistics produced by SeqAssist or derived from SeqAssist output files are designed to inform the user: whether, what percentage, how many times and how evenly a genomic locus (i.e., gene, scaffold, chromosome or genome) is covered by sequencing reads, how redundant the sequencing reads are in a single run or between two runs. These statistics can guide the user in evaluating the quality of a DNA library prepared for RNA-Seq or genome (re-)sequencing and in deciding the number of sequencing runs required for the library. We have tested SeqAssist using a synthetic dataset and demonstrated its main features using multiple NGS datasets generated from genome re-sequencing experiments. CONCLUSIONS: SeqAssist is a useful and informative tool that can serve as a valuable "assistant" to a broad range of investigators who conduct genome re-sequencing, RNA-Seq, or de novo genome sequencing and assembly experiments.
format Online
Article
Text
id pubmed-4251038
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42510382014-12-02 SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data Peng, Yan Maxwell, Andrew S Barker, Natalie D Laird, Jennifer G Kennedy, Alan J Wang, Nan Zhang, Chaoyang Gong, Ping BMC Bioinformatics Proceedings BACKGROUND: While next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data. As an effort to fill this gap to keep up with the fast pace of technological advancement and to accelerate data-to-results turnaround, we developed a novel software package named SeqAssist ("Sequencing Assistant" or SA). RESULTS: SeqAssist takes NGS-generated FASTQ files as the input, employs the BWA-MEM aligner for sequence alignment, and aims to provide a quick overview and basic statistics of NGS data. It consists of three separate workflows: (1) the SA_RunStats workflow generates basic statistics about an NGS dataset, including numbers of raw, cleaned, redundant and unique reads, redundancy rate, and a list of unique sequences with length and read count; (2) the SA_Run2Ref workflow estimates the breadth, depth and evenness of genome-wide coverage of the NGS dataset at a nucleotide resolution; and (3) the SA_Run2Run workflow compares two NGS datasets to determine the redundancy (overlapping rate) between the two NGS runs. Statistics produced by SeqAssist or derived from SeqAssist output files are designed to inform the user: whether, what percentage, how many times and how evenly a genomic locus (i.e., gene, scaffold, chromosome or genome) is covered by sequencing reads, how redundant the sequencing reads are in a single run or between two runs. These statistics can guide the user in evaluating the quality of a DNA library prepared for RNA-Seq or genome (re-)sequencing and in deciding the number of sequencing runs required for the library. We have tested SeqAssist using a synthetic dataset and demonstrated its main features using multiple NGS datasets generated from genome re-sequencing experiments. CONCLUSIONS: SeqAssist is a useful and informative tool that can serve as a valuable "assistant" to a broad range of investigators who conduct genome re-sequencing, RNA-Seq, or de novo genome sequencing and assembly experiments. BioMed Central 2014-10-21 /pmc/articles/PMC4251038/ /pubmed/25349885 http://dx.doi.org/10.1186/1471-2105-15-S11-S10 Text en Copyright © 2014 Peng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Peng, Yan
Maxwell, Andrew S
Barker, Natalie D
Laird, Jennifer G
Kennedy, Alan J
Wang, Nan
Zhang, Chaoyang
Gong, Ping
SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
title SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
title_full SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
title_fullStr SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
title_full_unstemmed SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
title_short SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
title_sort seqassist: a novel toolkit for preliminary analysis of next-generation sequencing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4251038/
https://www.ncbi.nlm.nih.gov/pubmed/25349885
http://dx.doi.org/10.1186/1471-2105-15-S11-S10
work_keys_str_mv AT pengyan seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT maxwellandrews seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT barkernatalied seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT lairdjenniferg seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT kennedyalanj seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT wangnan seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT zhangchaoyang seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata
AT gongping seqassistanoveltoolkitforpreliminaryanalysisofnextgenerationsequencingdata