Cargando…

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical prac...

Descripción completa

Detalles Bibliográficos
Autores principales: Jarlier, Frédéric, Joly, Nicolas, Fedy, Nicolas, Magalhaes, Thomas, Sirotti, Leonor, Paganiban, Paul, Martin, Firmin, McManus, Michael, Hupé, Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7429925/
https://www.ncbi.nlm.nih.gov/pubmed/32913637
http://dx.doi.org/10.12688/f1000research.22954.3
_version_ 1783571345256218624
author Jarlier, Frédéric
Joly, Nicolas
Fedy, Nicolas
Magalhaes, Thomas
Sirotti, Leonor
Paganiban, Paul
Martin, Firmin
McManus, Michael
Hupé, Philippe
author_facet Jarlier, Frédéric
Joly, Nicolas
Fedy, Nicolas
Magalhaes, Thomas
Sirotti, Leonor
Paganiban, Paul
Martin, Firmin
McManus, Michael
Hupé, Philippe
author_sort Jarlier, Frédéric
collection PubMed
description Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data.  Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.
format Online
Article
Text
id pubmed-7429925
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-74299252020-09-09 QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing Jarlier, Frédéric Joly, Nicolas Fedy, Nicolas Magalhaes, Thomas Sirotti, Leonor Paganiban, Paul Martin, Firmin McManus, Michael Hupé, Philippe F1000Res Software Tool Article Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data.  Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization. F1000 Research Limited 2020-10-08 /pmc/articles/PMC7429925/ /pubmed/32913637 http://dx.doi.org/10.12688/f1000research.22954.3 Text en Copyright: © 2020 Jarlier F et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Jarlier, Frédéric
Joly, Nicolas
Fedy, Nicolas
Magalhaes, Thomas
Sirotti, Leonor
Paganiban, Paul
Martin, Firmin
McManus, Michael
Hupé, Philippe
QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
title QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
title_full QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
title_fullStr QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
title_full_unstemmed QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
title_short QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
title_sort quartic: quick parallel algorithms for high-throughput sequencing data processing
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7429925/
https://www.ncbi.nlm.nih.gov/pubmed/32913637
http://dx.doi.org/10.12688/f1000research.22954.3
work_keys_str_mv AT jarlierfrederic quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT jolynicolas quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT fedynicolas quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT magalhaesthomas quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT sirottileonor quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT paganibanpaul quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT martinfirmin quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT mcmanusmichael quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing
AT hupephilippe quarticquickparallelalgorithmsforhighthroughputsequencingdataprocessing