Cargando…
Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plasti...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9296850/ https://www.ncbi.nlm.nih.gov/pubmed/35874012 http://dx.doi.org/10.3389/fpls.2022.779830 |
_version_ | 1784750350307164160 |
---|---|
author | Giorgashvili, Eka Reichel, Katja Caswara, Calvinna Kerimov, Vuqar Borsch, Thomas Gruenstaeudl, Michael |
author_facet | Giorgashvili, Eka Reichel, Katja Caswara, Calvinna Kerimov, Vuqar Borsch, Thomas Gruenstaeudl, Michael |
author_sort | Giorgashvili, Eka |
collection | PubMed |
description | Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice. |
format | Online Article Text |
id | pubmed-9296850 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92968502022-07-21 Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense Giorgashvili, Eka Reichel, Katja Caswara, Calvinna Kerimov, Vuqar Borsch, Thomas Gruenstaeudl, Michael Front Plant Sci Plant Science Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice. Frontiers Media S.A. 2022-07-06 /pmc/articles/PMC9296850/ /pubmed/35874012 http://dx.doi.org/10.3389/fpls.2022.779830 Text en Copyright © 2022 Giorgashvili, Reichel, Caswara, Kerimov, Borsch and Gruenstaeudl. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Giorgashvili, Eka Reichel, Katja Caswara, Calvinna Kerimov, Vuqar Borsch, Thomas Gruenstaeudl, Michael Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense |
title | Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense |
title_full | Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense |
title_fullStr | Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense |
title_full_unstemmed | Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense |
title_short | Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense |
title_sort | software choice and sequencing coverage can impact plastid genome assembly–a case study in the narrow endemic calligonum bakuense |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9296850/ https://www.ncbi.nlm.nih.gov/pubmed/35874012 http://dx.doi.org/10.3389/fpls.2022.779830 |
work_keys_str_mv | AT giorgashvilieka softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense AT reichelkatja softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense AT caswaracalvinna softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense AT kerimovvuqar softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense AT borschthomas softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense AT gruenstaeudlmichael softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense |