Cargando…

Fast lightweight accurate xenograft sorting

MOTIVATION: With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zentgraf, Jens, Rahmann, Sven
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8017614/ https://www.ncbi.nlm.nih.gov/pubmed/33810805 http://dx.doi.org/10.1186/s13015-021-00181-w

_version_	1783674083880206336
author	Zentgraf, Jens Rahmann, Sven
author_facet	Zentgraf, Jens Rahmann, Sven
author_sort	Zentgraf, Jens
collection	PubMed
description	MOTIVATION: With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. RESULTS: We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. AVAILABILITY: Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.
format	Online Article Text
id	pubmed-8017614
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-80176142021-04-02 Fast lightweight accurate xenograft sorting Zentgraf, Jens Rahmann, Sven Algorithms Mol Biol Research MOTIVATION: With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. RESULTS: We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. AVAILABILITY: Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing. BioMed Central 2021-04-02 /pmc/articles/PMC8017614/ /pubmed/33810805 http://dx.doi.org/10.1186/s13015-021-00181-w Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Zentgraf, Jens Rahmann, Sven Fast lightweight accurate xenograft sorting
title	Fast lightweight accurate xenograft sorting
title_full	Fast lightweight accurate xenograft sorting
title_fullStr	Fast lightweight accurate xenograft sorting
title_full_unstemmed	Fast lightweight accurate xenograft sorting
title_short	Fast lightweight accurate xenograft sorting
title_sort	fast lightweight accurate xenograft sorting
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8017614/ https://www.ncbi.nlm.nih.gov/pubmed/33810805 http://dx.doi.org/10.1186/s13015-021-00181-w
work_keys_str_mv	AT zentgrafjens fastlightweightaccuratexenograftsorting AT rahmannsven fastlightweightaccuratexenograftsorting

Fast lightweight accurate xenograft sorting

Ejemplares similares