Cargando…
UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR)...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8193862/ https://www.ncbi.nlm.nih.gov/pubmed/34122513 http://dx.doi.org/10.3389/fgene.2021.660366 |
_version_ | 1783706308380196864 |
---|---|
author | Tsagiopoulou, Maria Maniou, Maria Christina Pechlivanis, Nikolaos Togkousidis, Anastasis Kotrová, Michaela Hutzenlaub, Tobias Kappas, Ilias Chatzidimitriou, Anastasia Psomopoulos, Fotis |
author_facet | Tsagiopoulou, Maria Maniou, Maria Christina Pechlivanis, Nikolaos Togkousidis, Anastasis Kotrová, Michaela Hutzenlaub, Tobias Kappas, Ilias Chatzidimitriou, Anastasia Psomopoulos, Fotis |
author_sort | Tsagiopoulou, Maria |
collection | PubMed |
description | A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc. |
format | Online Article Text |
id | pubmed-8193862 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81938622021-06-12 UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction Tsagiopoulou, Maria Maniou, Maria Christina Pechlivanis, Nikolaos Togkousidis, Anastasis Kotrová, Michaela Hutzenlaub, Tobias Kappas, Ilias Chatzidimitriou, Anastasia Psomopoulos, Fotis Front Genet Genetics A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc. Frontiers Media S.A. 2021-05-28 /pmc/articles/PMC8193862/ /pubmed/34122513 http://dx.doi.org/10.3389/fgene.2021.660366 Text en Copyright © 2021 Tsagiopoulou, Maniou, Pechlivanis, Togkousidis, Kotrová, Hutzenlaub, Kappas, Chatzidimitriou and Psomopoulos. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Tsagiopoulou, Maria Maniou, Maria Christina Pechlivanis, Nikolaos Togkousidis, Anastasis Kotrová, Michaela Hutzenlaub, Tobias Kappas, Ilias Chatzidimitriou, Anastasia Psomopoulos, Fotis UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction |
title | UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction |
title_full | UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction |
title_fullStr | UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction |
title_full_unstemmed | UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction |
title_short | UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction |
title_sort | umic: a preprocessing method for umi deduplication and reads correction |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8193862/ https://www.ncbi.nlm.nih.gov/pubmed/34122513 http://dx.doi.org/10.3389/fgene.2021.660366 |
work_keys_str_mv | AT tsagiopouloumaria umicapreprocessingmethodforumideduplicationandreadscorrection AT manioumariachristina umicapreprocessingmethodforumideduplicationandreadscorrection AT pechlivanisnikolaos umicapreprocessingmethodforumideduplicationandreadscorrection AT togkousidisanastasis umicapreprocessingmethodforumideduplicationandreadscorrection AT kotrovamichaela umicapreprocessingmethodforumideduplicationandreadscorrection AT hutzenlaubtobias umicapreprocessingmethodforumideduplicationandreadscorrection AT kappasilias umicapreprocessingmethodforumideduplicationandreadscorrection AT chatzidimitriouanastasia umicapreprocessingmethodforumideduplicationandreadscorrection AT psomopoulosfotis umicapreprocessingmethodforumideduplicationandreadscorrection |