Cargando…

SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies

BACKGROUND: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Li-An, Chang, Yu-Jung, Chen, Shu-Hwa, Lin, Chung-Yen, Ho, Jan-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402383/
https://www.ncbi.nlm.nih.gov/pubmed/30999844
http://dx.doi.org/10.1186/s12864-019-5445-3
_version_ 1783566746795376640
author Yang, Li-An
Chang, Yu-Jung
Chen, Shu-Hwa
Lin, Chung-Yen
Ho, Jan-Ming
author_facet Yang, Li-An
Chang, Yu-Jung
Chen, Shu-Hwa
Lin, Chung-Yen
Ho, Jan-Ming
author_sort Yang, Li-An
collection PubMed
description BACKGROUND: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. RESULTS: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers’ attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. AVAILABILITY: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5445-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7402383
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74023832020-08-07 SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies Yang, Li-An Chang, Yu-Jung Chen, Shu-Hwa Lin, Chung-Yen Ho, Jan-Ming BMC Genomics Research BACKGROUND: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. RESULTS: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers’ attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. AVAILABILITY: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5445-3) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-18 /pmc/articles/PMC7402383/ /pubmed/30999844 http://dx.doi.org/10.1186/s12864-019-5445-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yang, Li-An
Chang, Yu-Jung
Chen, Shu-Hwa
Lin, Chung-Yen
Ho, Jan-Ming
SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
title SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
title_full SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
title_fullStr SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
title_full_unstemmed SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
title_short SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
title_sort squat: a sequencing quality assessment tool for data quality assessments of genome assemblies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402383/
https://www.ncbi.nlm.nih.gov/pubmed/30999844
http://dx.doi.org/10.1186/s12864-019-5445-3
work_keys_str_mv AT yanglian squatasequencingqualityassessmenttoolfordataqualityassessmentsofgenomeassemblies
AT changyujung squatasequencingqualityassessmenttoolfordataqualityassessmentsofgenomeassemblies
AT chenshuhwa squatasequencingqualityassessmenttoolfordataqualityassessmentsofgenomeassemblies
AT linchungyen squatasequencingqualityassessmenttoolfordataqualityassessmentsofgenomeassemblies
AT hojanming squatasequencingqualityassessmenttoolfordataqualityassessmentsofgenomeassemblies