Cargando…

Determining the quality and complexity of next-generation sequencing data without a reference genome

We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Anvar, Seyed Yahya, Khachatryan, Lusine, Vermaat, Martijn, van Galen, Michiel, Pulyakhina, Irina, Ariyurek, Yavuz, Kraaijeveld, Ken, den Dunnen, Johan T, de Knijff, Peter, ’t Hoen, Peter AC, Laros, Jeroen FJ
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298064/
https://www.ncbi.nlm.nih.gov/pubmed/25514851
http://dx.doi.org/10.1186/s13059-014-0555-3
_version_ 1782353215743852544
author Anvar, Seyed Yahya
Khachatryan, Lusine
Vermaat, Martijn
van Galen, Michiel
Pulyakhina, Irina
Ariyurek, Yavuz
Kraaijeveld, Ken
den Dunnen, Johan T
de Knijff, Peter
’t Hoen, Peter AC
Laros, Jeroen FJ
author_facet Anvar, Seyed Yahya
Khachatryan, Lusine
Vermaat, Martijn
van Galen, Michiel
Pulyakhina, Irina
Ariyurek, Yavuz
Kraaijeveld, Ken
den Dunnen, Johan T
de Knijff, Peter
’t Hoen, Peter AC
Laros, Jeroen FJ
author_sort Anvar, Seyed Yahya
collection PubMed
description We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-014-0555-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4298064
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42980642015-02-03 Determining the quality and complexity of next-generation sequencing data without a reference genome Anvar, Seyed Yahya Khachatryan, Lusine Vermaat, Martijn van Galen, Michiel Pulyakhina, Irina Ariyurek, Yavuz Kraaijeveld, Ken den Dunnen, Johan T de Knijff, Peter ’t Hoen, Peter AC Laros, Jeroen FJ Genome Biol Method We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-014-0555-3) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-17 2014 /pmc/articles/PMC4298064/ /pubmed/25514851 http://dx.doi.org/10.1186/s13059-014-0555-3 Text en © Anvar et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Method
Anvar, Seyed Yahya
Khachatryan, Lusine
Vermaat, Martijn
van Galen, Michiel
Pulyakhina, Irina
Ariyurek, Yavuz
Kraaijeveld, Ken
den Dunnen, Johan T
de Knijff, Peter
’t Hoen, Peter AC
Laros, Jeroen FJ
Determining the quality and complexity of next-generation sequencing data without a reference genome
title Determining the quality and complexity of next-generation sequencing data without a reference genome
title_full Determining the quality and complexity of next-generation sequencing data without a reference genome
title_fullStr Determining the quality and complexity of next-generation sequencing data without a reference genome
title_full_unstemmed Determining the quality and complexity of next-generation sequencing data without a reference genome
title_short Determining the quality and complexity of next-generation sequencing data without a reference genome
title_sort determining the quality and complexity of next-generation sequencing data without a reference genome
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298064/
https://www.ncbi.nlm.nih.gov/pubmed/25514851
http://dx.doi.org/10.1186/s13059-014-0555-3
work_keys_str_mv AT anvarseyedyahya determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT khachatryanlusine determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT vermaatmartijn determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT vangalenmichiel determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT pulyakhinairina determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT ariyurekyavuz determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT kraaijeveldken determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT dendunnenjohant determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT deknijffpeter determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT thoenpeterac determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome
AT larosjeroenfj determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome