Cargando…

Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum

The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagen...

Descripción completa

Detalles Bibliográficos
Autores principales: Addo-Quaye, Charles, Tuinstra, Mitch, Carraro, Nicola, Weil, Clifford, Dilkes, Brian P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5844295/
https://www.ncbi.nlm.nih.gov/pubmed/29378822
http://dx.doi.org/10.1534/g3.117.300301
_version_ 1783305227782324224
author Addo-Quaye, Charles
Tuinstra, Mitch
Carraro, Nicola
Weil, Clifford
Dilkes, Brian P.
author_facet Addo-Quaye, Charles
Tuinstra, Mitch
Carraro, Nicola
Weil, Clifford
Dilkes, Brian P.
author_sort Addo-Quaye, Charles
collection PubMed
description The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghum bicolor BTx623 individuals, that using replication to detect false positive-induced variants in next-generation sequencing (NGS) data permits higher throughput variant detection with greater accuracy. We used a lower sequence coverage depth (average of 7×) from 586 independently mutagenized individuals and detected 5,399,493 homozygous single nucleotide polymorphisms (SNPs). Of these, 76% originated from only 57,872 genomic positions prone to false positive variant calling. These positions are characterized by high copy number paralogs where the error-prone SNP positions are at copies containing a variant at the SNP position. The ability of short stretches of homology to generate these error-prone positions suggests that incompletely assembled or poorly mapped repeated sequences are one driver of these error-prone positions. Removal of these false positives left 1,275,872 homozygous and 477,531 heterozygous EMS-induced SNPs, which, congruent with the mutagenic mechanism of EMS, were >98% G:C to A:T transitions. Through this analysis, we generated a collection of sequence indexed mutants of sorghum. This collection contains 4035 high-impact homozygous mutations in 3637 genes and 56,514 homozygous missense mutations in 23,227 genes. Each line contains, on average, 2177 annotated homozygous SNPs per genome, including seven likely gene knockouts and 96 missense mutations. The number of mutations in a transcript was linearly correlated with the transcript length and also the G+C count, but not with the GC/AT ratio. Analysis of the detected mutagenized positions identified CG-rich patches, and flanking sequences strongly influenced EMS-induced mutation rates. This method for detecting false positive-induced mutations is generally applicable to any organism, is independent of the choice of in silico variant-calling algorithm, and is most valuable when the true mutation rate is likely to be low, such as in laboratory-induced mutations or somatic mutation detection in medicine.
format Online
Article
Text
id pubmed-5844295
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-58442952018-03-22 Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum Addo-Quaye, Charles Tuinstra, Mitch Carraro, Nicola Weil, Clifford Dilkes, Brian P. G3 (Bethesda) Investigations The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghum bicolor BTx623 individuals, that using replication to detect false positive-induced variants in next-generation sequencing (NGS) data permits higher throughput variant detection with greater accuracy. We used a lower sequence coverage depth (average of 7×) from 586 independently mutagenized individuals and detected 5,399,493 homozygous single nucleotide polymorphisms (SNPs). Of these, 76% originated from only 57,872 genomic positions prone to false positive variant calling. These positions are characterized by high copy number paralogs where the error-prone SNP positions are at copies containing a variant at the SNP position. The ability of short stretches of homology to generate these error-prone positions suggests that incompletely assembled or poorly mapped repeated sequences are one driver of these error-prone positions. Removal of these false positives left 1,275,872 homozygous and 477,531 heterozygous EMS-induced SNPs, which, congruent with the mutagenic mechanism of EMS, were >98% G:C to A:T transitions. Through this analysis, we generated a collection of sequence indexed mutants of sorghum. This collection contains 4035 high-impact homozygous mutations in 3637 genes and 56,514 homozygous missense mutations in 23,227 genes. Each line contains, on average, 2177 annotated homozygous SNPs per genome, including seven likely gene knockouts and 96 missense mutations. The number of mutations in a transcript was linearly correlated with the transcript length and also the G+C count, but not with the GC/AT ratio. Analysis of the detected mutagenized positions identified CG-rich patches, and flanking sequences strongly influenced EMS-induced mutation rates. This method for detecting false positive-induced mutations is generally applicable to any organism, is independent of the choice of in silico variant-calling algorithm, and is most valuable when the true mutation rate is likely to be low, such as in laboratory-induced mutations or somatic mutation detection in medicine. Genetics Society of America 2018-01-25 /pmc/articles/PMC5844295/ /pubmed/29378822 http://dx.doi.org/10.1534/g3.117.300301 Text en Copyright © 2018 Addo-Quaye et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Addo-Quaye, Charles
Tuinstra, Mitch
Carraro, Nicola
Weil, Clifford
Dilkes, Brian P.
Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
title Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
title_full Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
title_fullStr Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
title_full_unstemmed Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
title_short Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
title_sort whole-genome sequence accuracy is improved by replication in a population of mutagenized sorghum
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5844295/
https://www.ncbi.nlm.nih.gov/pubmed/29378822
http://dx.doi.org/10.1534/g3.117.300301
work_keys_str_mv AT addoquayecharles wholegenomesequenceaccuracyisimprovedbyreplicationinapopulationofmutagenizedsorghum
AT tuinstramitch wholegenomesequenceaccuracyisimprovedbyreplicationinapopulationofmutagenizedsorghum
AT carraronicola wholegenomesequenceaccuracyisimprovedbyreplicationinapopulationofmutagenizedsorghum
AT weilclifford wholegenomesequenceaccuracyisimprovedbyreplicationinapopulationofmutagenizedsorghum
AT dilkesbrianp wholegenomesequenceaccuracyisimprovedbyreplicationinapopulationofmutagenizedsorghum