Cargando…

A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Microbes live in complex communities that are of major importance for environmental ecology, public health, and animal physiology and pathology. Short-read metagenomic shotgun sequencing is currently the state-of-the-art technique for exploring these communities. With the aid of metagenomics, our un...

Descripción completa

Detalles Bibliográficos
Autores principales: Andreu-Sánchez, Sergio, Chen, Lianmin, Wang, Daoming, Augustijn, Hannah E., Zhernakova, Alexandra, Fu, Jingyuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8141913/
https://www.ncbi.nlm.nih.gov/pubmed/34040632
http://dx.doi.org/10.3389/fgene.2021.648229
_version_ 1783696467825786880
author Andreu-Sánchez, Sergio
Chen, Lianmin
Wang, Daoming
Augustijn, Hannah E.
Zhernakova, Alexandra
Fu, Jingyuan
author_facet Andreu-Sánchez, Sergio
Chen, Lianmin
Wang, Daoming
Augustijn, Hannah E.
Zhernakova, Alexandra
Fu, Jingyuan
author_sort Andreu-Sánchez, Sergio
collection PubMed
description Microbes live in complex communities that are of major importance for environmental ecology, public health, and animal physiology and pathology. Short-read metagenomic shotgun sequencing is currently the state-of-the-art technique for exploring these communities. With the aid of metagenomics, our understanding of the microbiome is moving from composition toward functionality, even down to the genetic variant level. While the exploration of single-nucleotide variation in a genome is a standard procedure in genomics, and many sophisticated tools exist to perform this task, identification of genetic variation in metagenomes remains challenging. Major factors that hamper the widespread application of variant-calling analysis include low-depth sequencing of individual genomes (which is especially significant for the microorganisms present in low abundance), the existence of large genomic variation even within the same species, the absence of comprehensive reference genomes, and the noise introduced by next-generation sequencing errors. Some bioinformatics tools, such as metaSNV or InStrain, have been created to identify genetic variants in metagenomes, but the performance of these tools has not been systematically assessed or compared with the variant callers commonly used on single or pooled genomes. In this study, we benchmark seven bioinformatic tools for genetic variant calling in metagenomics data and assess their performance. To do so, we simulated metagenomic reads to mimic human microbial composition, sequencing errors, and genetic variability. We also simulated different conditions, including low and high depth of coverage and unique or multiple strains per species. Our analysis of the simulated data shows that probabilistic method-based tools such as HaplotypeCaller and Mutect2 from the GATK toolset show the best performance. By applying these tools to longitudinal gut microbiome data from the Human Microbiome Project, we show that the genetic similarity between longitudinal samples from the same individuals is significantly greater than the similarity between samples from different individuals. Our benchmark shows that probabilistic tools can be used to call metagenomes, and we recommend the use of GATK’s tools as reliable variant callers for metagenomic samples.
format Online
Article
Text
id pubmed-8141913
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81419132021-05-25 A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing Andreu-Sánchez, Sergio Chen, Lianmin Wang, Daoming Augustijn, Hannah E. Zhernakova, Alexandra Fu, Jingyuan Front Genet Genetics Microbes live in complex communities that are of major importance for environmental ecology, public health, and animal physiology and pathology. Short-read metagenomic shotgun sequencing is currently the state-of-the-art technique for exploring these communities. With the aid of metagenomics, our understanding of the microbiome is moving from composition toward functionality, even down to the genetic variant level. While the exploration of single-nucleotide variation in a genome is a standard procedure in genomics, and many sophisticated tools exist to perform this task, identification of genetic variation in metagenomes remains challenging. Major factors that hamper the widespread application of variant-calling analysis include low-depth sequencing of individual genomes (which is especially significant for the microorganisms present in low abundance), the existence of large genomic variation even within the same species, the absence of comprehensive reference genomes, and the noise introduced by next-generation sequencing errors. Some bioinformatics tools, such as metaSNV or InStrain, have been created to identify genetic variants in metagenomes, but the performance of these tools has not been systematically assessed or compared with the variant callers commonly used on single or pooled genomes. In this study, we benchmark seven bioinformatic tools for genetic variant calling in metagenomics data and assess their performance. To do so, we simulated metagenomic reads to mimic human microbial composition, sequencing errors, and genetic variability. We also simulated different conditions, including low and high depth of coverage and unique or multiple strains per species. Our analysis of the simulated data shows that probabilistic method-based tools such as HaplotypeCaller and Mutect2 from the GATK toolset show the best performance. By applying these tools to longitudinal gut microbiome data from the Human Microbiome Project, we show that the genetic similarity between longitudinal samples from the same individuals is significantly greater than the similarity between samples from different individuals. Our benchmark shows that probabilistic tools can be used to call metagenomes, and we recommend the use of GATK’s tools as reliable variant callers for metagenomic samples. Frontiers Media S.A. 2021-05-10 /pmc/articles/PMC8141913/ /pubmed/34040632 http://dx.doi.org/10.3389/fgene.2021.648229 Text en Copyright © 2021 Andreu-Sánchez, Chen, Wang, Augustijn, Zhernakova and Fu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Andreu-Sánchez, Sergio
Chen, Lianmin
Wang, Daoming
Augustijn, Hannah E.
Zhernakova, Alexandra
Fu, Jingyuan
A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
title A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
title_full A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
title_fullStr A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
title_full_unstemmed A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
title_short A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
title_sort benchmark of genetic variant calling pipelines using metagenomic short-read sequencing
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8141913/
https://www.ncbi.nlm.nih.gov/pubmed/34040632
http://dx.doi.org/10.3389/fgene.2021.648229
work_keys_str_mv AT andreusanchezsergio abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT chenlianmin abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT wangdaoming abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT augustijnhannahe abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT zhernakovaalexandra abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT fujingyuan abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT andreusanchezsergio benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT chenlianmin benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT wangdaoming benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT augustijnhannahe benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT zhernakovaalexandra benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing
AT fujingyuan benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing