Cargando…
Comprehensive de novo mutation discovery with HiFi long-read sequencing
BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing make...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10169305/ https://www.ncbi.nlm.nih.gov/pubmed/37158973 http://dx.doi.org/10.1186/s13073-023-01183-6 |
_version_ | 1785039027293913088 |
---|---|
author | Kucuk, Erdi van der Sanden, Bart P. G. H. O’Gorman, Luke Kwint, Michael Derks, Ronny Wenger, Aaron M. Lambert, Christine Chakraborty, Shreyasee Baybayan, Primo Rowell, William J. Brunner, Han G. Vissers, Lisenka E. L. M. Hoischen, Alexander Gilissen, Christian |
author_facet | Kucuk, Erdi van der Sanden, Bart P. G. H. O’Gorman, Luke Kwint, Michael Derks, Ronny Wenger, Aaron M. Lambert, Christine Chakraborty, Shreyasee Baybayan, Primo Rowell, William J. Brunner, Han G. Vissers, Lisenka E. L. M. Hoischen, Alexander Gilissen, Christian |
author_sort | Kucuk, Erdi |
collection | PubMed |
description | BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent–child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-023-01183-6. |
format | Online Article Text |
id | pubmed-10169305 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-101693052023-05-11 Comprehensive de novo mutation discovery with HiFi long-read sequencing Kucuk, Erdi van der Sanden, Bart P. G. H. O’Gorman, Luke Kwint, Michael Derks, Ronny Wenger, Aaron M. Lambert, Christine Chakraborty, Shreyasee Baybayan, Primo Rowell, William J. Brunner, Han G. Vissers, Lisenka E. L. M. Hoischen, Alexander Gilissen, Christian Genome Med Research BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent–child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-023-01183-6. BioMed Central 2023-05-08 /pmc/articles/PMC10169305/ /pubmed/37158973 http://dx.doi.org/10.1186/s13073-023-01183-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Kucuk, Erdi van der Sanden, Bart P. G. H. O’Gorman, Luke Kwint, Michael Derks, Ronny Wenger, Aaron M. Lambert, Christine Chakraborty, Shreyasee Baybayan, Primo Rowell, William J. Brunner, Han G. Vissers, Lisenka E. L. M. Hoischen, Alexander Gilissen, Christian Comprehensive de novo mutation discovery with HiFi long-read sequencing |
title | Comprehensive de novo mutation discovery with HiFi long-read sequencing |
title_full | Comprehensive de novo mutation discovery with HiFi long-read sequencing |
title_fullStr | Comprehensive de novo mutation discovery with HiFi long-read sequencing |
title_full_unstemmed | Comprehensive de novo mutation discovery with HiFi long-read sequencing |
title_short | Comprehensive de novo mutation discovery with HiFi long-read sequencing |
title_sort | comprehensive de novo mutation discovery with hifi long-read sequencing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10169305/ https://www.ncbi.nlm.nih.gov/pubmed/37158973 http://dx.doi.org/10.1186/s13073-023-01183-6 |
work_keys_str_mv | AT kucukerdi comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT vandersandenbartpgh comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT ogormanluke comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT kwintmichael comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT derksronny comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT wengeraaronm comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT lambertchristine comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT chakrabortyshreyasee comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT baybayanprimo comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT rowellwilliamj comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT brunnerhang comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT visserslisenkaelm comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT hoischenalexander comprehensivedenovomutationdiscoverywithhifilongreadsequencing AT gilissenchristian comprehensivedenovomutationdiscoverywithhifilongreadsequencing |