Cargando…

Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim

BACKGROUND: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Chen, Lo, Theodora, Nip, Ka Ming, Hafezqorani, Saber, Warren, René L, Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025935/
https://www.ncbi.nlm.nih.gov/pubmed/36939007
http://dx.doi.org/10.1093/gigascience/giad013
_version_ 1784909440201261056
author Yang, Chen
Lo, Theodora
Nip, Ka Ming
Hafezqorani, Saber
Warren, René L
Birol, Inanc
author_facet Yang, Chen
Lo, Theodora
Nip, Ka Ming
Hafezqorani, Saber
Warren, René L
Birol, Inanc
author_sort Yang, Chen
collection PubMed
description BACKGROUND: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS: The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.
format Online
Article
Text
id pubmed-10025935
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100259352023-03-21 Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim Yang, Chen Lo, Theodora Nip, Ka Ming Hafezqorani, Saber Warren, René L Birol, Inanc Gigascience Technical Note BACKGROUND: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS: The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim. Oxford University Press 2023-03-20 /pmc/articles/PMC10025935/ /pubmed/36939007 http://dx.doi.org/10.1093/gigascience/giad013 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Yang, Chen
Lo, Theodora
Nip, Ka Ming
Hafezqorani, Saber
Warren, René L
Birol, Inanc
Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim
title Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim
title_full Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim
title_fullStr Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim
title_full_unstemmed Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim
title_short Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim
title_sort characterization and simulation of metagenomic nanopore sequencing data with meta-nanosim
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025935/
https://www.ncbi.nlm.nih.gov/pubmed/36939007
http://dx.doi.org/10.1093/gigascience/giad013
work_keys_str_mv AT yangchen characterizationandsimulationofmetagenomicnanoporesequencingdatawithmetananosim
AT lotheodora characterizationandsimulationofmetagenomicnanoporesequencingdatawithmetananosim
AT nipkaming characterizationandsimulationofmetagenomicnanoporesequencingdatawithmetananosim
AT hafezqoranisaber characterizationandsimulationofmetagenomicnanoporesequencingdatawithmetananosim
AT warrenrenel characterizationandsimulationofmetagenomicnanoporesequencingdatawithmetananosim
AT birolinanc characterizationandsimulationofmetagenomicnanoporesequencingdatawithmetananosim