Cargando…

The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes

Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Glick, Lior, Mayrose, Itay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340445/
https://www.ncbi.nlm.nih.gov/pubmed/37401440
http://dx.doi.org/10.1093/gbe/evad121
_version_ 1785072081408360448
author Glick, Lior
Mayrose, Itay
author_facet Glick, Lior
Mayrose, Itay
author_sort Glick, Lior
collection PubMed
description Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence–absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies.
format Online
Article
Text
id pubmed-10340445
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103404452023-07-14 The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes Glick, Lior Mayrose, Itay Genome Biol Evol Article Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence–absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies. Oxford University Press 2023-07-04 /pmc/articles/PMC10340445/ /pubmed/37401440 http://dx.doi.org/10.1093/gbe/evad121 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Article
Glick, Lior
Mayrose, Itay
The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
title The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
title_full The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
title_fullStr The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
title_full_unstemmed The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
title_short The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
title_sort effect of methodological considerations on the construction of gene-based plant pan-genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340445/
https://www.ncbi.nlm.nih.gov/pubmed/37401440
http://dx.doi.org/10.1093/gbe/evad121
work_keys_str_mv AT glicklior theeffectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes
AT mayroseitay theeffectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes
AT glicklior effectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes
AT mayroseitay effectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes