Cargando…

Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures

BACKGROUND: We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Neubert, Kerstin, Zuchantke, Eric, Leidenfrost, Robert Maximilian, Wuenschiers, Roebbe, Grützke, Josephine, Malorny, Burkhard, Brendebach, Holger, Al Dahouk, Sascha, Homeier, Timo, Hotzel, Helmut, Reinert, Knut, Tomaso, Herbert, Busch, Anne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590783/
https://www.ncbi.nlm.nih.gov/pubmed/34773979
http://dx.doi.org/10.1186/s12864-021-08115-x
Descripción
Sumario:BACKGROUND: We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods. RESULTS: We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach. CONCLUSIONS: Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-08115-x.