Cargando…
Detection of simple and complex de novo mutations with multiple reference sequences
The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which sho...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7462078/ https://www.ncbi.nlm.nih.gov/pubmed/32817236 http://dx.doi.org/10.1101/gr.255505.119 |
_version_ | 1783576850802409472 |
---|---|
author | Garimella, Kiran V. Iqbal, Zamin Krause, Michael A. Campino, Susana Kekre, Mihir Drury, Eleanor Kwiatkowski, Dominic Sá, Juliana M. Wellems, Thomas E. McVean, Gil |
author_facet | Garimella, Kiran V. Iqbal, Zamin Krause, Michael A. Campino, Susana Kekre, Mihir Drury, Eleanor Kwiatkowski, Dominic Sá, Juliana M. Wellems, Thomas E. McVean, Gil |
author_sort | Garimella, Kiran V. |
collection | PubMed |
description | The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read–derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events. |
format | Online Article Text |
id | pubmed-7462078 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-74620782020-09-11 Detection of simple and complex de novo mutations with multiple reference sequences Garimella, Kiran V. Iqbal, Zamin Krause, Michael A. Campino, Susana Kekre, Mihir Drury, Eleanor Kwiatkowski, Dominic Sá, Juliana M. Wellems, Thomas E. McVean, Gil Genome Res Method The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read–derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events. Cold Spring Harbor Laboratory Press 2020-08 /pmc/articles/PMC7462078/ /pubmed/32817236 http://dx.doi.org/10.1101/gr.255505.119 Text en © 2020 Garimella et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Method Garimella, Kiran V. Iqbal, Zamin Krause, Michael A. Campino, Susana Kekre, Mihir Drury, Eleanor Kwiatkowski, Dominic Sá, Juliana M. Wellems, Thomas E. McVean, Gil Detection of simple and complex de novo mutations with multiple reference sequences |
title | Detection of simple and complex de novo mutations with multiple reference sequences |
title_full | Detection of simple and complex de novo mutations with multiple reference sequences |
title_fullStr | Detection of simple and complex de novo mutations with multiple reference sequences |
title_full_unstemmed | Detection of simple and complex de novo mutations with multiple reference sequences |
title_short | Detection of simple and complex de novo mutations with multiple reference sequences |
title_sort | detection of simple and complex de novo mutations with multiple reference sequences |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7462078/ https://www.ncbi.nlm.nih.gov/pubmed/32817236 http://dx.doi.org/10.1101/gr.255505.119 |
work_keys_str_mv | AT garimellakiranv detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT iqbalzamin detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT krausemichaela detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT campinosusana detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT kekremihir detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT druryeleanor detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT kwiatkowskidominic detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT sajulianam detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT wellemsthomase detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences AT mcveangil detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences |