Cargando…
Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity
BACKGROUND: It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated re...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9999624/ https://www.ncbi.nlm.nih.gov/pubmed/36895055 http://dx.doi.org/10.1186/s13059-023-02875-3 |
_version_ | 1784903696410214400 |
---|---|
author | Jaegle, Benjamin Pisupati, Rahul Soto-Jiménez, Luz Mayela Burns, Robin Rabanal, Fernando A. Nordborg, Magnus |
author_facet | Jaegle, Benjamin Pisupati, Rahul Soto-Jiménez, Luz Mayela Burns, Robin Rabanal, Fernando A. Nordborg, Magnus |
author_sort | Jaegle, Benjamin |
collection | PubMed |
description | BACKGROUND: It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that Arabidopsis thaliana (A. thaliana) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation. RESULTS: The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism. CONCLUSIONS: Our study confirms that most heterozygous SNP calls in A. thaliana are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02875-3. |
format | Online Article Text |
id | pubmed-9999624 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-99996242023-03-11 Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity Jaegle, Benjamin Pisupati, Rahul Soto-Jiménez, Luz Mayela Burns, Robin Rabanal, Fernando A. Nordborg, Magnus Genome Biol Research BACKGROUND: It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that Arabidopsis thaliana (A. thaliana) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation. RESULTS: The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism. CONCLUSIONS: Our study confirms that most heterozygous SNP calls in A. thaliana are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02875-3. BioMed Central 2023-03-09 /pmc/articles/PMC9999624/ /pubmed/36895055 http://dx.doi.org/10.1186/s13059-023-02875-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Jaegle, Benjamin Pisupati, Rahul Soto-Jiménez, Luz Mayela Burns, Robin Rabanal, Fernando A. Nordborg, Magnus Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity |
title | Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity |
title_full | Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity |
title_fullStr | Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity |
title_full_unstemmed | Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity |
title_short | Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity |
title_sort | extensive sequence duplication in arabidopsis revealed by pseudo-heterozygosity |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9999624/ https://www.ncbi.nlm.nih.gov/pubmed/36895055 http://dx.doi.org/10.1186/s13059-023-02875-3 |
work_keys_str_mv | AT jaeglebenjamin extensivesequenceduplicationinarabidopsisrevealedbypseudoheterozygosity AT pisupatirahul extensivesequenceduplicationinarabidopsisrevealedbypseudoheterozygosity AT sotojimenezluzmayela extensivesequenceduplicationinarabidopsisrevealedbypseudoheterozygosity AT burnsrobin extensivesequenceduplicationinarabidopsisrevealedbypseudoheterozygosity AT rabanalfernandoa extensivesequenceduplicationinarabidopsisrevealedbypseudoheterozygosity AT nordborgmagnus extensivesequenceduplicationinarabidopsisrevealedbypseudoheterozygosity |