Cargando…

Comprehensive de novo mutation discovery with HiFi long-read sequencing

BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing make...

Descripción completa

Detalles Bibliográficos
Autores principales: Kucuk, Erdi, van der Sanden, Bart P. G. H., O’Gorman, Luke, Kwint, Michael, Derks, Ronny, Wenger, Aaron M., Lambert, Christine, Chakraborty, Shreyasee, Baybayan, Primo, Rowell, William J., Brunner, Han G., Vissers, Lisenka E. L. M., Hoischen, Alexander, Gilissen, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10169305/
https://www.ncbi.nlm.nih.gov/pubmed/37158973
http://dx.doi.org/10.1186/s13073-023-01183-6
_version_ 1785039027293913088
author Kucuk, Erdi
van der Sanden, Bart P. G. H.
O’Gorman, Luke
Kwint, Michael
Derks, Ronny
Wenger, Aaron M.
Lambert, Christine
Chakraborty, Shreyasee
Baybayan, Primo
Rowell, William J.
Brunner, Han G.
Vissers, Lisenka E. L. M.
Hoischen, Alexander
Gilissen, Christian
author_facet Kucuk, Erdi
van der Sanden, Bart P. G. H.
O’Gorman, Luke
Kwint, Michael
Derks, Ronny
Wenger, Aaron M.
Lambert, Christine
Chakraborty, Shreyasee
Baybayan, Primo
Rowell, William J.
Brunner, Han G.
Vissers, Lisenka E. L. M.
Hoischen, Alexander
Gilissen, Christian
author_sort Kucuk, Erdi
collection PubMed
description BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent–child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-023-01183-6.
format Online
Article
Text
id pubmed-10169305
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101693052023-05-11 Comprehensive de novo mutation discovery with HiFi long-read sequencing Kucuk, Erdi van der Sanden, Bart P. G. H. O’Gorman, Luke Kwint, Michael Derks, Ronny Wenger, Aaron M. Lambert, Christine Chakraborty, Shreyasee Baybayan, Primo Rowell, William J. Brunner, Han G. Vissers, Lisenka E. L. M. Hoischen, Alexander Gilissen, Christian Genome Med Research BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent–child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-023-01183-6. BioMed Central 2023-05-08 /pmc/articles/PMC10169305/ /pubmed/37158973 http://dx.doi.org/10.1186/s13073-023-01183-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kucuk, Erdi
van der Sanden, Bart P. G. H.
O’Gorman, Luke
Kwint, Michael
Derks, Ronny
Wenger, Aaron M.
Lambert, Christine
Chakraborty, Shreyasee
Baybayan, Primo
Rowell, William J.
Brunner, Han G.
Vissers, Lisenka E. L. M.
Hoischen, Alexander
Gilissen, Christian
Comprehensive de novo mutation discovery with HiFi long-read sequencing
title Comprehensive de novo mutation discovery with HiFi long-read sequencing
title_full Comprehensive de novo mutation discovery with HiFi long-read sequencing
title_fullStr Comprehensive de novo mutation discovery with HiFi long-read sequencing
title_full_unstemmed Comprehensive de novo mutation discovery with HiFi long-read sequencing
title_short Comprehensive de novo mutation discovery with HiFi long-read sequencing
title_sort comprehensive de novo mutation discovery with hifi long-read sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10169305/
https://www.ncbi.nlm.nih.gov/pubmed/37158973
http://dx.doi.org/10.1186/s13073-023-01183-6
work_keys_str_mv AT kucukerdi comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT vandersandenbartpgh comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT ogormanluke comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT kwintmichael comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT derksronny comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT wengeraaronm comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT lambertchristine comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT chakrabortyshreyasee comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT baybayanprimo comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT rowellwilliamj comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT brunnerhang comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT visserslisenkaelm comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT hoischenalexander comprehensivedenovomutationdiscoverywithhifilongreadsequencing
AT gilissenchristian comprehensivedenovomutationdiscoverywithhifilongreadsequencing