Cargando…
The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes
Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple geno...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340445/ https://www.ncbi.nlm.nih.gov/pubmed/37401440 http://dx.doi.org/10.1093/gbe/evad121 |
_version_ | 1785072081408360448 |
---|---|
author | Glick, Lior Mayrose, Itay |
author_facet | Glick, Lior Mayrose, Itay |
author_sort | Glick, Lior |
collection | PubMed |
description | Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence–absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies. |
format | Online Article Text |
id | pubmed-10340445 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103404452023-07-14 The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes Glick, Lior Mayrose, Itay Genome Biol Evol Article Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence–absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies. Oxford University Press 2023-07-04 /pmc/articles/PMC10340445/ /pubmed/37401440 http://dx.doi.org/10.1093/gbe/evad121 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Article Glick, Lior Mayrose, Itay The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes |
title | The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes |
title_full | The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes |
title_fullStr | The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes |
title_full_unstemmed | The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes |
title_short | The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes |
title_sort | effect of methodological considerations on the construction of gene-based plant pan-genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340445/ https://www.ncbi.nlm.nih.gov/pubmed/37401440 http://dx.doi.org/10.1093/gbe/evad121 |
work_keys_str_mv | AT glicklior theeffectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes AT mayroseitay theeffectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes AT glicklior effectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes AT mayroseitay effectofmethodologicalconsiderationsontheconstructionofgenebasedplantpangenomes |