Cargando…

Detection of simple and complex de novo mutations with multiple reference sequences

The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which sho...

Descripción completa

Detalles Bibliográficos
Autores principales: Garimella, Kiran V., Iqbal, Zamin, Krause, Michael A., Campino, Susana, Kekre, Mihir, Drury, Eleanor, Kwiatkowski, Dominic, Sá, Juliana M., Wellems, Thomas E., McVean, Gil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7462078/
https://www.ncbi.nlm.nih.gov/pubmed/32817236
http://dx.doi.org/10.1101/gr.255505.119
_version_ 1783576850802409472
author Garimella, Kiran V.
Iqbal, Zamin
Krause, Michael A.
Campino, Susana
Kekre, Mihir
Drury, Eleanor
Kwiatkowski, Dominic
Sá, Juliana M.
Wellems, Thomas E.
McVean, Gil
author_facet Garimella, Kiran V.
Iqbal, Zamin
Krause, Michael A.
Campino, Susana
Kekre, Mihir
Drury, Eleanor
Kwiatkowski, Dominic
Sá, Juliana M.
Wellems, Thomas E.
McVean, Gil
author_sort Garimella, Kiran V.
collection PubMed
description The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read–derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.
format Online
Article
Text
id pubmed-7462078
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-74620782020-09-11 Detection of simple and complex de novo mutations with multiple reference sequences Garimella, Kiran V. Iqbal, Zamin Krause, Michael A. Campino, Susana Kekre, Mihir Drury, Eleanor Kwiatkowski, Dominic Sá, Juliana M. Wellems, Thomas E. McVean, Gil Genome Res Method The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read–derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events. Cold Spring Harbor Laboratory Press 2020-08 /pmc/articles/PMC7462078/ /pubmed/32817236 http://dx.doi.org/10.1101/gr.255505.119 Text en © 2020 Garimella et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
spellingShingle Method
Garimella, Kiran V.
Iqbal, Zamin
Krause, Michael A.
Campino, Susana
Kekre, Mihir
Drury, Eleanor
Kwiatkowski, Dominic
Sá, Juliana M.
Wellems, Thomas E.
McVean, Gil
Detection of simple and complex de novo mutations with multiple reference sequences
title Detection of simple and complex de novo mutations with multiple reference sequences
title_full Detection of simple and complex de novo mutations with multiple reference sequences
title_fullStr Detection of simple and complex de novo mutations with multiple reference sequences
title_full_unstemmed Detection of simple and complex de novo mutations with multiple reference sequences
title_short Detection of simple and complex de novo mutations with multiple reference sequences
title_sort detection of simple and complex de novo mutations with multiple reference sequences
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7462078/
https://www.ncbi.nlm.nih.gov/pubmed/32817236
http://dx.doi.org/10.1101/gr.255505.119
work_keys_str_mv AT garimellakiranv detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT iqbalzamin detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT krausemichaela detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT campinosusana detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT kekremihir detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT druryeleanor detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT kwiatkowskidominic detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT sajulianam detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT wellemsthomase detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences
AT mcveangil detectionofsimpleandcomplexdenovomutationswithmultiplereferencesequences