Cargando…

elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing ident...

Descripción completa

Detalles Bibliográficos
Autores principales: Herzeel, Charlotte, Costanza, Pascal, Decap, Dries, Fostier, Jan, Reumers, Joke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504710/
https://www.ncbi.nlm.nih.gov/pubmed/26182406
http://dx.doi.org/10.1371/journal.pone.0132868
_version_ 1782381511780073472
author Herzeel, Charlotte
Costanza, Pascal
Decap, Dries
Fostier, Jan
Reumers, Joke
author_facet Herzeel, Charlotte
Costanza, Pascal
Decap, Dries
Fostier, Jan
Reumers, Joke
author_sort Herzeel, Charlotte
collection PubMed
description elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.
format Online
Article
Text
id pubmed-4504710
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45047102015-07-17 elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling Herzeel, Charlotte Costanza, Pascal Decap, Dries Fostier, Jan Reumers, Joke PLoS One Research Article elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. Public Library of Science 2015-07-16 /pmc/articles/PMC4504710/ /pubmed/26182406 http://dx.doi.org/10.1371/journal.pone.0132868 Text en © 2015 Herzeel et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Herzeel, Charlotte
Costanza, Pascal
Decap, Dries
Fostier, Jan
Reumers, Joke
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
title elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
title_full elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
title_fullStr elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
title_full_unstemmed elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
title_short elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
title_sort elprep: high-performance preparation of sequence alignment/map files for variant calling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504710/
https://www.ncbi.nlm.nih.gov/pubmed/26182406
http://dx.doi.org/10.1371/journal.pone.0132868
work_keys_str_mv AT herzeelcharlotte elprephighperformancepreparationofsequencealignmentmapfilesforvariantcalling
AT costanzapascal elprephighperformancepreparationofsequencealignmentmapfilesforvariantcalling
AT decapdries elprephighperformancepreparationofsequencealignmentmapfilesforvariantcalling
AT fostierjan elprephighperformancepreparationofsequencealignmentmapfilesforvariantcalling
AT reumersjoke elprephighperformancepreparationofsequencealignmentmapfilesforvariantcalling