Cargando…

Improvement of large copy number variant detection by whole genome nanopore sequencing

INTRODUCTION: Whole-genome sequencing using nanopore technologies can uncover structural variants, which are DNA rearrangements larger than 50 base pairs. Nanopore technologies can also characterize their boundaries with single-base accuracy, owing to the kilobase-long reads that encompass either fu...

Descripción completa

Detalles Bibliográficos
Autores principales: Cuenca-Guardiola, Javier, de la Morena-Barrio, Belén, García, Juan L., Sanchis-Juan, Alba, Corral, Javier, Fernández-Breis, Jesualdo T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403694/
https://www.ncbi.nlm.nih.gov/pubmed/36323370
http://dx.doi.org/10.1016/j.jare.2022.10.012
_version_ 1785085126967820288
author Cuenca-Guardiola, Javier
de la Morena-Barrio, Belén
García, Juan L.
Sanchis-Juan, Alba
Corral, Javier
Fernández-Breis, Jesualdo T.
author_facet Cuenca-Guardiola, Javier
de la Morena-Barrio, Belén
García, Juan L.
Sanchis-Juan, Alba
Corral, Javier
Fernández-Breis, Jesualdo T.
author_sort Cuenca-Guardiola, Javier
collection PubMed
description INTRODUCTION: Whole-genome sequencing using nanopore technologies can uncover structural variants, which are DNA rearrangements larger than 50 base pairs. Nanopore technologies can also characterize their boundaries with single-base accuracy, owing to the kilobase-long reads that encompass either full variants or their junctions. Other methods, such as next-generation short read sequencing or PCR assays, are limited in their capabilities to detect or characterize structural variants. However, the existing software for nanopore sequencing data analysis still reports incomplete variant sets, which also contain erroneous calls, a considerable obstacle for the molecular diagnosis or accurate genotyping of populations. METHODS: We compared multiple factors affecting variant calling, such as reference genome version, aligner (minimap2, NGMLR, and lra) choice, and variant caller combinations (Sniffles, CuteSV, SVIM, and NanoVar), to find the optimal group of tools for calling large (>50 kb) deletions and duplications, using data from seven patients exhibiting gross gene defects on SERPINC1 and from a reference variant set as the control. The goal was to obtain the most complete, yet reasonably specific group of large variants using a single cell of PromethION sequencing, which yielded lower depth coverage than short-read sequencing. We also used a custom method for the statistical analysis of the coverage value to refine the resulting datasets. RESULTS: We found that for large deletions and duplications (>50 kb), the existing software performed worse than for smaller ones, in terms of both sensitivity and specificity, and newer tools had not improved this. Our novel software, disCoverage, could polish variant callers’ results, improving specificity by up to 62% and sensitivity by 15%, the latter requiring other data or samples. CONCLUSION: We analyzed the current situation of >50-kb copy number variants with nanopore sequencing, which could be improved. The methods presented in this work could help to identify the known deletions and duplications in a set of patients, while also helping to filter out erroneous calls for these variants, which might aid the efforts to characterize a not-yet well-known fraction of genetic variability in the human genome.
format Online
Article
Text
id pubmed-10403694
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-104036942023-08-06 Improvement of large copy number variant detection by whole genome nanopore sequencing Cuenca-Guardiola, Javier de la Morena-Barrio, Belén García, Juan L. Sanchis-Juan, Alba Corral, Javier Fernández-Breis, Jesualdo T. J Adv Res Original Article INTRODUCTION: Whole-genome sequencing using nanopore technologies can uncover structural variants, which are DNA rearrangements larger than 50 base pairs. Nanopore technologies can also characterize their boundaries with single-base accuracy, owing to the kilobase-long reads that encompass either full variants or their junctions. Other methods, such as next-generation short read sequencing or PCR assays, are limited in their capabilities to detect or characterize structural variants. However, the existing software for nanopore sequencing data analysis still reports incomplete variant sets, which also contain erroneous calls, a considerable obstacle for the molecular diagnosis or accurate genotyping of populations. METHODS: We compared multiple factors affecting variant calling, such as reference genome version, aligner (minimap2, NGMLR, and lra) choice, and variant caller combinations (Sniffles, CuteSV, SVIM, and NanoVar), to find the optimal group of tools for calling large (>50 kb) deletions and duplications, using data from seven patients exhibiting gross gene defects on SERPINC1 and from a reference variant set as the control. The goal was to obtain the most complete, yet reasonably specific group of large variants using a single cell of PromethION sequencing, which yielded lower depth coverage than short-read sequencing. We also used a custom method for the statistical analysis of the coverage value to refine the resulting datasets. RESULTS: We found that for large deletions and duplications (>50 kb), the existing software performed worse than for smaller ones, in terms of both sensitivity and specificity, and newer tools had not improved this. Our novel software, disCoverage, could polish variant callers’ results, improving specificity by up to 62% and sensitivity by 15%, the latter requiring other data or samples. CONCLUSION: We analyzed the current situation of >50-kb copy number variants with nanopore sequencing, which could be improved. The methods presented in this work could help to identify the known deletions and duplications in a set of patients, while also helping to filter out erroneous calls for these variants, which might aid the efforts to characterize a not-yet well-known fraction of genetic variability in the human genome. Elsevier 2022-10-30 /pmc/articles/PMC10403694/ /pubmed/36323370 http://dx.doi.org/10.1016/j.jare.2022.10.012 Text en © 2023 The Authors. Published by Elsevier B.V. on behalf of Cairo University. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Article
Cuenca-Guardiola, Javier
de la Morena-Barrio, Belén
García, Juan L.
Sanchis-Juan, Alba
Corral, Javier
Fernández-Breis, Jesualdo T.
Improvement of large copy number variant detection by whole genome nanopore sequencing
title Improvement of large copy number variant detection by whole genome nanopore sequencing
title_full Improvement of large copy number variant detection by whole genome nanopore sequencing
title_fullStr Improvement of large copy number variant detection by whole genome nanopore sequencing
title_full_unstemmed Improvement of large copy number variant detection by whole genome nanopore sequencing
title_short Improvement of large copy number variant detection by whole genome nanopore sequencing
title_sort improvement of large copy number variant detection by whole genome nanopore sequencing
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403694/
https://www.ncbi.nlm.nih.gov/pubmed/36323370
http://dx.doi.org/10.1016/j.jare.2022.10.012
work_keys_str_mv AT cuencaguardiolajavier improvementoflargecopynumbervariantdetectionbywholegenomenanoporesequencing
AT delamorenabarriobelen improvementoflargecopynumbervariantdetectionbywholegenomenanoporesequencing
AT garciajuanl improvementoflargecopynumbervariantdetectionbywholegenomenanoporesequencing
AT sanchisjuanalba improvementoflargecopynumbervariantdetectionbywholegenomenanoporesequencing
AT corraljavier improvementoflargecopynumbervariantdetectionbywholegenomenanoporesequencing
AT fernandezbreisjesualdot improvementoflargecopynumbervariantdetectionbywholegenomenanoporesequencing