Cargando…

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better und...

Descripción completa

Detalles Bibliográficos
Autores principales: Mapleson, Daniel, Garcia Accinelli, Gonzalo, Kettleborough, George, Wright, Jonathan, Clavijo, Bernardo J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408915/
https://www.ncbi.nlm.nih.gov/pubmed/27797770
http://dx.doi.org/10.1093/bioinformatics/btw663
_version_ 1783232383514836992
author Mapleson, Daniel
Garcia Accinelli, Gonzalo
Kettleborough, George
Wright, Jonathan
Clavijo, Bernardo J
author_facet Mapleson, Daniel
Garcia Accinelli, Gonzalo
Kettleborough, George
Wright, Jonathan
Clavijo, Bernardo J
author_sort Mapleson, Daniel
collection PubMed
description MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. RESULTS: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies. AVAILABILITY AND IMPLEMENTATION: KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5408915
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54089152017-05-03 KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies Mapleson, Daniel Garcia Accinelli, Gonzalo Kettleborough, George Wright, Jonathan Clavijo, Bernardo J Bioinformatics Applications Notes MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. RESULTS: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies. AVAILABILITY AND IMPLEMENTATION: KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-02-15 2016-11-28 /pmc/articles/PMC5408915/ /pubmed/27797770 http://dx.doi.org/10.1093/bioinformatics/btw663 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Mapleson, Daniel
Garcia Accinelli, Gonzalo
Kettleborough, George
Wright, Jonathan
Clavijo, Bernardo J
KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
title KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
title_full KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
title_fullStr KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
title_full_unstemmed KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
title_short KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies
title_sort kat: a k-mer analysis toolkit to quality control ngs datasets and genome assemblies
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408915/
https://www.ncbi.nlm.nih.gov/pubmed/27797770
http://dx.doi.org/10.1093/bioinformatics/btw663
work_keys_str_mv AT maplesondaniel katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies
AT garciaaccinelligonzalo katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies
AT kettleboroughgeorge katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies
AT wrightjonathan katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies
AT clavijobernardoj katakmeranalysistoolkittoqualitycontrolngsdatasetsandgenomeassemblies