Cargando…

Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense

Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plasti...

Descripción completa

Detalles Bibliográficos
Autores principales: Giorgashvili, Eka, Reichel, Katja, Caswara, Calvinna, Kerimov, Vuqar, Borsch, Thomas, Gruenstaeudl, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9296850/
https://www.ncbi.nlm.nih.gov/pubmed/35874012
http://dx.doi.org/10.3389/fpls.2022.779830
_version_ 1784750350307164160
author Giorgashvili, Eka
Reichel, Katja
Caswara, Calvinna
Kerimov, Vuqar
Borsch, Thomas
Gruenstaeudl, Michael
author_facet Giorgashvili, Eka
Reichel, Katja
Caswara, Calvinna
Kerimov, Vuqar
Borsch, Thomas
Gruenstaeudl, Michael
author_sort Giorgashvili, Eka
collection PubMed
description Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
format Online
Article
Text
id pubmed-9296850
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92968502022-07-21 Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense Giorgashvili, Eka Reichel, Katja Caswara, Calvinna Kerimov, Vuqar Borsch, Thomas Gruenstaeudl, Michael Front Plant Sci Plant Science Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice. Frontiers Media S.A. 2022-07-06 /pmc/articles/PMC9296850/ /pubmed/35874012 http://dx.doi.org/10.3389/fpls.2022.779830 Text en Copyright © 2022 Giorgashvili, Reichel, Caswara, Kerimov, Borsch and Gruenstaeudl. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Giorgashvili, Eka
Reichel, Katja
Caswara, Calvinna
Kerimov, Vuqar
Borsch, Thomas
Gruenstaeudl, Michael
Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
title Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
title_full Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
title_fullStr Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
title_full_unstemmed Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
title_short Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly–A Case Study in the Narrow Endemic Calligonum bakuense
title_sort software choice and sequencing coverage can impact plastid genome assembly–a case study in the narrow endemic calligonum bakuense
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9296850/
https://www.ncbi.nlm.nih.gov/pubmed/35874012
http://dx.doi.org/10.3389/fpls.2022.779830
work_keys_str_mv AT giorgashvilieka softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense
AT reichelkatja softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense
AT caswaracalvinna softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense
AT kerimovvuqar softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense
AT borschthomas softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense
AT gruenstaeudlmichael softwarechoiceandsequencingcoveragecanimpactplastidgenomeassemblyacasestudyinthenarrowendemiccalligonumbakuense