Cargando…

UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries

MOTIVATION: With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at...

Descripción completa

Detalles Bibliográficos
Autores principales: Sater, Vincent, Viailly, Pierre-Julien, Lecroq, Thierry, Ruminy, Philippe, Bérard, Caroline, Prieur-Gaston, Élise, Jardin, Fabrice
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7484502/
https://www.ncbi.nlm.nih.gov/pubmed/32952940
http://dx.doi.org/10.1016/j.csbj.2020.08.011
_version_ 1783580990817435648
author Sater, Vincent
Viailly, Pierre-Julien
Lecroq, Thierry
Ruminy, Philippe
Bérard, Caroline
Prieur-Gaston, Élise
Jardin, Fabrice
author_facet Sater, Vincent
Viailly, Pierre-Julien
Lecroq, Thierry
Ruminy, Philippe
Bérard, Caroline
Prieur-Gaston, Élise
Jardin, Fabrice
author_sort Sater, Vincent
collection PubMed
description MOTIVATION: With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies. Multiple variant callers are publicly available and are usually efficient at calling out variants. However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts. The recent use of Unique Molecular Identifiers (UMI) in NGS experiments has offered a way to accurately separate true variants from artifacts. UMI-based variant callers are slowly replacing raw-read based variant callers as the standard method for an accurate detection of variants at very low frequencies. However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy. RESULTS: We present UMI-Gen, a UMI-based read simulator for targeted sequencing paired-end data. UMI-Gen generates reference reads covering the targeted regions at a user customizable depth. After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data. Finally, it will insert real variants in the reads from a list provided by the user. AVAILABILITY: The entire pipeline is available at https://gitlab.com/vincent-sater/umigen under MIT license.
format Online
Article
Text
id pubmed-7484502
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-74845022020-09-17 UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries Sater, Vincent Viailly, Pierre-Julien Lecroq, Thierry Ruminy, Philippe Bérard, Caroline Prieur-Gaston, Élise Jardin, Fabrice Comput Struct Biotechnol J Research Article MOTIVATION: With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies. Multiple variant callers are publicly available and are usually efficient at calling out variants. However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts. The recent use of Unique Molecular Identifiers (UMI) in NGS experiments has offered a way to accurately separate true variants from artifacts. UMI-based variant callers are slowly replacing raw-read based variant callers as the standard method for an accurate detection of variants at very low frequencies. However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy. RESULTS: We present UMI-Gen, a UMI-based read simulator for targeted sequencing paired-end data. UMI-Gen generates reference reads covering the targeted regions at a user customizable depth. After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data. Finally, it will insert real variants in the reads from a list provided by the user. AVAILABILITY: The entire pipeline is available at https://gitlab.com/vincent-sater/umigen under MIT license. Research Network of Computational and Structural Biotechnology 2020-08-27 /pmc/articles/PMC7484502/ /pubmed/32952940 http://dx.doi.org/10.1016/j.csbj.2020.08.011 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Sater, Vincent
Viailly, Pierre-Julien
Lecroq, Thierry
Ruminy, Philippe
Bérard, Caroline
Prieur-Gaston, Élise
Jardin, Fabrice
UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries
title UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries
title_full UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries
title_fullStr UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries
title_full_unstemmed UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries
title_short UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries
title_sort umi-gen: a umi-based read simulator for variant calling evaluation in paired-end sequencing ngs libraries
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7484502/
https://www.ncbi.nlm.nih.gov/pubmed/32952940
http://dx.doi.org/10.1016/j.csbj.2020.08.011
work_keys_str_mv AT satervincent umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries
AT viaillypierrejulien umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries
AT lecroqthierry umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries
AT ruminyphilippe umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries
AT berardcaroline umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries
AT prieurgastonelise umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries
AT jardinfabrice umigenaumibasedreadsimulatorforvariantcallingevaluationinpairedendsequencingngslibraries