Cargando…
KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better und...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408915/ https://www.ncbi.nlm.nih.gov/pubmed/27797770 http://dx.doi.org/10.1093/bioinformatics/btw663 |
_version_ | 1783232383514836992 |
---|---|
author | Mapleson, Daniel Garcia Accinelli, Gonzalo Kettleborough, George Wright, Jonathan Clavijo, Bernardo J |
author_facet | Mapleson, Daniel Garcia Accinelli, Gonzalo Kettleborough, George Wright, Jonathan Clavijo, Bernardo J |
author_sort | Mapleson, Daniel |
collection | PubMed |
description | MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. RESULTS: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies. AVAILABILITY AND IMPLEMENTATION: KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5408915 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-54089152017-05-03 KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies Mapleson, Daniel Garcia Accinelli, Gonzalo Kettleborough, George Wright, Jonathan Clavijo, Bernardo J Bioinformatics Applications Notes MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. RESULTS: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies. AVAILABILITY AND IMPLEMENTATION: KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-02-15 2016-11-28 /pmc/articles/PMC5408915/ /pubmed/27797770 http://dx.doi.org/10.1093/bioinformatics/btw663 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Mapleson, Daniel Garcia Accinelli, Gonzalo Kettleborough, George Wright, Jonathan Clavijo, Bernardo J KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies |
title | KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies |
title_full | KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies |
title_fullStr | KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies |
title_full_unstemmed | KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies |
title_short | KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies |
title_sort | kat: a k-mer analysis toolkit to quality control ngs datasets and genome assemblies |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408915/ https://www.ncbi.nlm.nih.gov/pubmed/27797770 http://dx.doi.org/10.1093/bioinformatics/btw663 |
work_keys_str_mv | AT maplesondaniel katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies AT garciaaccinelligonzalo katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies AT kettleboroughgeorge katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies AT wrightjonathan katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies AT clavijobernardoj katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies |