Cargando…

Fast-HBR: Fast hash based duplicate read remover

The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate rea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Altayyar, Sami, Artoli, Abdel Monim
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Biomedical Informatics 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://www.ncbi.nlm.nih.gov/pubmed/35815196 http://dx.doi.org/10.6026/97320630018036

_version_	1784728100375887872
author	Altayyar, Sami Artoli, Abdel Monim
author_facet	Altayyar, Sami Artoli, Abdel Monim
author_sort	Altayyar, Sami
collection	PubMed
description	The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR.
format	Online Article Text
id	pubmed-9200608
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Biomedical Informatics
record_format	MEDLINE/PubMed
spelling	pubmed-92006082022-07-07 Fast-HBR: Fast hash based duplicate read remover Altayyar, Sami Artoli, Abdel Monim Bioinformation Research Article The Next-Generation Sequencing (NGS) platforms produce massive amounts of data to analyze various features in environmental samples. These data contain multiple duplicate reads which impact the analyzing process efficiency and accuracy. We describe Fast-HBR, a fast and memory-efficient duplicate reads removing tool without a reference genome using de-novo principles. It uses hash tables to represent reads in integer value to minimize memory usage for faster manipulation. Fast-HBR is faster and has less memory footprint when compared with the state of the art De-novo duplicate removing tools. Fast-HBR implemented in Python 3 is available at https://github.com/Sami-Altayyar/Fast-HBR. Biomedical Informatics 2022-01-31 /pmc/articles/PMC9200608/ /pubmed/35815196 http://dx.doi.org/10.6026/97320630018036 Text en © 2022 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
spellingShingle	Research Article Altayyar, Sami Artoli, Abdel Monim Fast-HBR: Fast hash based duplicate read remover
title	Fast-HBR: Fast hash based duplicate read remover
title_full	Fast-HBR: Fast hash based duplicate read remover
title_fullStr	Fast-HBR: Fast hash based duplicate read remover
title_full_unstemmed	Fast-HBR: Fast hash based duplicate read remover
title_short	Fast-HBR: Fast hash based duplicate read remover
title_sort	fast-hbr: fast hash based duplicate read remover
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200608/ https://www.ncbi.nlm.nih.gov/pubmed/35815196 http://dx.doi.org/10.6026/97320630018036
work_keys_str_mv	AT altayyarsami fasthbrfasthashbasedduplicatereadremover AT artoliabdelmonim fasthbrfasthashbasedduplicatereadremover

Fast-HBR: Fast hash based duplicate read remover

Ejemplares similares