Cargando…

UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR)...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsagiopoulou, Maria, Maniou, Maria Christina, Pechlivanis, Nikolaos, Togkousidis, Anastasis, Kotrová, Michaela, Hutzenlaub, Tobias, Kappas, Ilias, Chatzidimitriou, Anastasia, Psomopoulos, Fotis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8193862/
https://www.ncbi.nlm.nih.gov/pubmed/34122513
http://dx.doi.org/10.3389/fgene.2021.660366
_version_ 1783706308380196864
author Tsagiopoulou, Maria
Maniou, Maria Christina
Pechlivanis, Nikolaos
Togkousidis, Anastasis
Kotrová, Michaela
Hutzenlaub, Tobias
Kappas, Ilias
Chatzidimitriou, Anastasia
Psomopoulos, Fotis
author_facet Tsagiopoulou, Maria
Maniou, Maria Christina
Pechlivanis, Nikolaos
Togkousidis, Anastasis
Kotrová, Michaela
Hutzenlaub, Tobias
Kappas, Ilias
Chatzidimitriou, Anastasia
Psomopoulos, Fotis
author_sort Tsagiopoulou, Maria
collection PubMed
description A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.
format Online
Article
Text
id pubmed-8193862
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81938622021-06-12 UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction Tsagiopoulou, Maria Maniou, Maria Christina Pechlivanis, Nikolaos Togkousidis, Anastasis Kotrová, Michaela Hutzenlaub, Tobias Kappas, Ilias Chatzidimitriou, Anastasia Psomopoulos, Fotis Front Genet Genetics A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc. Frontiers Media S.A. 2021-05-28 /pmc/articles/PMC8193862/ /pubmed/34122513 http://dx.doi.org/10.3389/fgene.2021.660366 Text en Copyright © 2021 Tsagiopoulou, Maniou, Pechlivanis, Togkousidis, Kotrová, Hutzenlaub, Kappas, Chatzidimitriou and Psomopoulos. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Tsagiopoulou, Maria
Maniou, Maria Christina
Pechlivanis, Nikolaos
Togkousidis, Anastasis
Kotrová, Michaela
Hutzenlaub, Tobias
Kappas, Ilias
Chatzidimitriou, Anastasia
Psomopoulos, Fotis
UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
title UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
title_full UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
title_fullStr UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
title_full_unstemmed UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
title_short UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
title_sort umic: a preprocessing method for umi deduplication and reads correction
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8193862/
https://www.ncbi.nlm.nih.gov/pubmed/34122513
http://dx.doi.org/10.3389/fgene.2021.660366
work_keys_str_mv AT tsagiopouloumaria umicapreprocessingmethodforumideduplicationandreadscorrection
AT manioumariachristina umicapreprocessingmethodforumideduplicationandreadscorrection
AT pechlivanisnikolaos umicapreprocessingmethodforumideduplicationandreadscorrection
AT togkousidisanastasis umicapreprocessingmethodforumideduplicationandreadscorrection
AT kotrovamichaela umicapreprocessingmethodforumideduplicationandreadscorrection
AT hutzenlaubtobias umicapreprocessingmethodforumideduplicationandreadscorrection
AT kappasilias umicapreprocessingmethodforumideduplicationandreadscorrection
AT chatzidimitriouanastasia umicapreprocessingmethodforumideduplicationandreadscorrection
AT psomopoulosfotis umicapreprocessingmethodforumideduplicationandreadscorrection