Cargando…

SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph

MOTIVATION: Structural variation (SV) is a class of genetic diversity whose importance is increasingly revealed by genome resequencing, especially with long-read technologies. One crucial problem when analyzing and comparing SVs in several individuals is their accurate genotyping, that is determinin...

Descripción completa

Detalles Bibliográficos
Autores principales: Romain, Sandra, Lemaitre, Claire
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311344/
https://www.ncbi.nlm.nih.gov/pubmed/37387169
http://dx.doi.org/10.1093/bioinformatics/btad237
_version_ 1785066723857137664
author Romain, Sandra
Lemaitre, Claire
author_facet Romain, Sandra
Lemaitre, Claire
author_sort Romain, Sandra
collection PubMed
description MOTIVATION: Structural variation (SV) is a class of genetic diversity whose importance is increasingly revealed by genome resequencing, especially with long-read technologies. One crucial problem when analyzing and comparing SVs in several individuals is their accurate genotyping, that is determining whether a described SV is present or absent in one sequenced individual, and if present, in how many copies. There are only a few methods dedicated to SV genotyping with long-read data, and all either suffer of a bias toward the reference allele by not representing equally all alleles, or have difficulties genotyping close or overlapping SVs due to a linear representation of the alleles. RESULTS: We present SVJedi-graph, a novel method for SV genotyping that relies on a variation graph to represent in a single data structure all alleles of a set of SVs. The long reads are mapped on the variation graph and the resulting alignments that cover allele-specific edges in the graph are used to estimate the most likely genotype for each SV. Running SVJedi-graph on simulated sets of close and overlapping deletions showed that this graph model prevents the bias toward the reference alleles and allows maintaining high genotyping accuracy whatever the SV proximity, contrary to other state of the art genotypers. On the human gold standard HG002 dataset, SVJedi-graph obtained the best performances, genotyping 99.5% of the high confidence SV callset with an accuracy of 95% in less than 30 min. AVAILABILITY AND IMPLEMENTATION: SVJedi-graph is distributed under an AGPL license and available on GitHub at https://github.com/SandraLouise/SVJedi-graph and as a BioConda package.
format Online
Article
Text
id pubmed-10311344
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113442023-07-01 SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph Romain, Sandra Lemaitre, Claire Bioinformatics Genome Sequence Analysis MOTIVATION: Structural variation (SV) is a class of genetic diversity whose importance is increasingly revealed by genome resequencing, especially with long-read technologies. One crucial problem when analyzing and comparing SVs in several individuals is their accurate genotyping, that is determining whether a described SV is present or absent in one sequenced individual, and if present, in how many copies. There are only a few methods dedicated to SV genotyping with long-read data, and all either suffer of a bias toward the reference allele by not representing equally all alleles, or have difficulties genotyping close or overlapping SVs due to a linear representation of the alleles. RESULTS: We present SVJedi-graph, a novel method for SV genotyping that relies on a variation graph to represent in a single data structure all alleles of a set of SVs. The long reads are mapped on the variation graph and the resulting alignments that cover allele-specific edges in the graph are used to estimate the most likely genotype for each SV. Running SVJedi-graph on simulated sets of close and overlapping deletions showed that this graph model prevents the bias toward the reference alleles and allows maintaining high genotyping accuracy whatever the SV proximity, contrary to other state of the art genotypers. On the human gold standard HG002 dataset, SVJedi-graph obtained the best performances, genotyping 99.5% of the high confidence SV callset with an accuracy of 95% in less than 30 min. AVAILABILITY AND IMPLEMENTATION: SVJedi-graph is distributed under an AGPL license and available on GitHub at https://github.com/SandraLouise/SVJedi-graph and as a BioConda package. Oxford University Press 2023-06-30 /pmc/articles/PMC10311344/ /pubmed/37387169 http://dx.doi.org/10.1093/bioinformatics/btad237 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genome Sequence Analysis
Romain, Sandra
Lemaitre, Claire
SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
title SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
title_full SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
title_fullStr SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
title_full_unstemmed SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
title_short SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
title_sort svjedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph
topic Genome Sequence Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311344/
https://www.ncbi.nlm.nih.gov/pubmed/37387169
http://dx.doi.org/10.1093/bioinformatics/btad237
work_keys_str_mv AT romainsandra svjedigraphimprovingthegenotypingofcloseandoverlappingstructuralvariantswithlongreadsusingavariationgraph
AT lemaitreclaire svjedigraphimprovingthegenotypingofcloseandoverlappingstructuralvariantswithlongreadsusingavariationgraph