Cargando…

Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers

BACKGROUND: The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Girardot, Charles, Scholtalbers, Jelle, Sauer, Sajoscha, Su, Shu-Yi, Furlong, Eileen E.M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5055726/
https://www.ncbi.nlm.nih.gov/pubmed/27717304
http://dx.doi.org/10.1186/s12859-016-1284-2
_version_ 1782458804568326144
author Girardot, Charles
Scholtalbers, Jelle
Sauer, Sajoscha
Su, Shu-Yi
Furlong, Eileen E.M.
author_facet Girardot, Charles
Scholtalbers, Jelle
Sauer, Sajoscha
Su, Shu-Yi
Furlong, Eileen E.M.
author_sort Girardot, Charles
collection PubMed
description BACKGROUND: The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. RESULTS: Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. CONCLUSIONS: Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je. Je can also be easily installed in Galaxy through the Galaxy toolshed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1284-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5055726
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50557262016-10-19 Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers Girardot, Charles Scholtalbers, Jelle Sauer, Sajoscha Su, Shu-Yi Furlong, Eileen E.M. BMC Bioinformatics Software BACKGROUND: The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. RESULTS: Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. CONCLUSIONS: Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je. Je can also be easily installed in Galaxy through the Galaxy toolshed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1284-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-08 /pmc/articles/PMC5055726/ /pubmed/27717304 http://dx.doi.org/10.1186/s12859-016-1284-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Girardot, Charles
Scholtalbers, Jelle
Sauer, Sajoscha
Su, Shu-Yi
Furlong, Eileen E.M.
Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers
title Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers
title_full Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers
title_fullStr Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers
title_full_unstemmed Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers
title_short Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers
title_sort je, a versatile suite to handle multiplexed ngs libraries with unique molecular identifiers
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5055726/
https://www.ncbi.nlm.nih.gov/pubmed/27717304
http://dx.doi.org/10.1186/s12859-016-1284-2
work_keys_str_mv AT girardotcharles jeaversatilesuitetohandlemultiplexedngslibrarieswithuniquemolecularidentifiers
AT scholtalbersjelle jeaversatilesuitetohandlemultiplexedngslibrarieswithuniquemolecularidentifiers
AT sauersajoscha jeaversatilesuitetohandlemultiplexedngslibrarieswithuniquemolecularidentifiers
AT sushuyi jeaversatilesuitetohandlemultiplexedngslibrarieswithuniquemolecularidentifiers
AT furlongeileenem jeaversatilesuitetohandlemultiplexedngslibrarieswithuniquemolecularidentifiers