Cargando…

Fast-HBR: Fast hash based duplicate read remover

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate rea...

Descripción completa

Detalles Bibliográficos
Autores principales: Altayyar, Sami, Artoli, Abdel Monim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/
https://www.ncbi.nlm.nih.gov/pubmed/35815196
http://dx.doi.org/10.6026/97320630018036
_version_ 1784728100375887872
author Altayyar, Sami
Artoli, Abdel Monim
author_facet Altayyar, Sami
Artoli, Abdel Monim
author_sort Altayyar, Sami
collection PubMed
description The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.
format Online
Article
Text
id pubmed-9200608
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-92006082022-07-07 Fast-HBR: Fast hash based duplicate read remover Altayyar, Sami Artoli, Abdel Monim Bioinformation Research Article The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR. Biomedical Informatics 2022-01-31 /pmc/articles/PMC9200608/ /pubmed/35815196 http://dx.doi.org/10.6026/97320630018036 Text en © 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Article
Altayyar, Sami
Artoli, Abdel Monim
Fast-HBR: Fast hash based duplicate read remover
title Fast-HBR: Fast hash based duplicate read remover
title_full Fast-HBR: Fast hash based duplicate read remover
title_fullStr Fast-HBR: Fast hash based duplicate read remover
title_full_unstemmed Fast-HBR: Fast hash based duplicate read remover
title_short Fast-HBR: Fast hash based duplicate read remover
title_sort fast-hbr: fast hash based duplicate read remover
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/
https://www.ncbi.nlm.nih.gov/pubmed/35815196
http://dx.doi.org/10.6026/97320630018036
work_keys_str_mv AT altayyarsami fasthbrfasthashbasedduplicatereadremover
AT artoliabdelmonim fasthbrfasthashbasedduplicatereadremover