Cargando…
Determining the quality and complexity of next-generation sequencing data without a reference genome
We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differenc...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298064/ https://www.ncbi.nlm.nih.gov/pubmed/25514851 http://dx.doi.org/10.1186/s13059-014-0555-3 |
_version_ | 1782353215743852544 |
---|---|
author | Anvar, Seyed Yahya Khachatryan, Lusine Vermaat, Martijn van Galen, Michiel Pulyakhina, Irina Ariyurek, Yavuz Kraaijeveld, Ken den Dunnen, Johan T de Knijff, Peter ’t Hoen, Peter AC Laros, Jeroen FJ |
author_facet | Anvar, Seyed Yahya Khachatryan, Lusine Vermaat, Martijn van Galen, Michiel Pulyakhina, Irina Ariyurek, Yavuz Kraaijeveld, Ken den Dunnen, Johan T de Knijff, Peter ’t Hoen, Peter AC Laros, Jeroen FJ |
author_sort | Anvar, Seyed Yahya |
collection | PubMed |
description | We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-014-0555-3) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4298064 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42980642015-02-03 Determining the quality and complexity of next-generation sequencing data without a reference genome Anvar, Seyed Yahya Khachatryan, Lusine Vermaat, Martijn van Galen, Michiel Pulyakhina, Irina Ariyurek, Yavuz Kraaijeveld, Ken den Dunnen, Johan T de Knijff, Peter ’t Hoen, Peter AC Laros, Jeroen FJ Genome Biol Method We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-014-0555-3) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-17 2014 /pmc/articles/PMC4298064/ /pubmed/25514851 http://dx.doi.org/10.1186/s13059-014-0555-3 Text en © Anvar et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Method Anvar, Seyed Yahya Khachatryan, Lusine Vermaat, Martijn van Galen, Michiel Pulyakhina, Irina Ariyurek, Yavuz Kraaijeveld, Ken den Dunnen, Johan T de Knijff, Peter ’t Hoen, Peter AC Laros, Jeroen FJ Determining the quality and complexity of next-generation sequencing data without a reference genome |
title | Determining the quality and complexity of next-generation sequencing data without a reference genome |
title_full | Determining the quality and complexity of next-generation sequencing data without a reference genome |
title_fullStr | Determining the quality and complexity of next-generation sequencing data without a reference genome |
title_full_unstemmed | Determining the quality and complexity of next-generation sequencing data without a reference genome |
title_short | Determining the quality and complexity of next-generation sequencing data without a reference genome |
title_sort | determining the quality and complexity of next-generation sequencing data without a reference genome |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298064/ https://www.ncbi.nlm.nih.gov/pubmed/25514851 http://dx.doi.org/10.1186/s13059-014-0555-3 |
work_keys_str_mv | AT anvarseyedyahya determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT khachatryanlusine determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT vermaatmartijn determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT vangalenmichiel determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT pulyakhinairina determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT ariyurekyavuz determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT kraaijeveldken determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT dendunnenjohant determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT deknijffpeter determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT thoenpeterac determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome AT larosjeroenfj determiningthequalityandcomplexityofnextgenerationsequencingdatawithoutareferencegenome |