Cargando…
A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
Microbes live in complex communities that are of major importance for environmental ecology, public health, and animal physiology and pathology. Short-read metagenomic shotgun sequencing is currently the state-of-the-art technique for exploring these communities. With the aid of metagenomics, our un...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8141913/ https://www.ncbi.nlm.nih.gov/pubmed/34040632 http://dx.doi.org/10.3389/fgene.2021.648229 |
_version_ | 1783696467825786880 |
---|---|
author | Andreu-Sánchez, Sergio Chen, Lianmin Wang, Daoming Augustijn, Hannah E. Zhernakova, Alexandra Fu, Jingyuan |
author_facet | Andreu-Sánchez, Sergio Chen, Lianmin Wang, Daoming Augustijn, Hannah E. Zhernakova, Alexandra Fu, Jingyuan |
author_sort | Andreu-Sánchez, Sergio |
collection | PubMed |
description | Microbes live in complex communities that are of major importance for environmental ecology, public health, and animal physiology and pathology. Short-read metagenomic shotgun sequencing is currently the state-of-the-art technique for exploring these communities. With the aid of metagenomics, our understanding of the microbiome is moving from composition toward functionality, even down to the genetic variant level. While the exploration of single-nucleotide variation in a genome is a standard procedure in genomics, and many sophisticated tools exist to perform this task, identification of genetic variation in metagenomes remains challenging. Major factors that hamper the widespread application of variant-calling analysis include low-depth sequencing of individual genomes (which is especially significant for the microorganisms present in low abundance), the existence of large genomic variation even within the same species, the absence of comprehensive reference genomes, and the noise introduced by next-generation sequencing errors. Some bioinformatics tools, such as metaSNV or InStrain, have been created to identify genetic variants in metagenomes, but the performance of these tools has not been systematically assessed or compared with the variant callers commonly used on single or pooled genomes. In this study, we benchmark seven bioinformatic tools for genetic variant calling in metagenomics data and assess their performance. To do so, we simulated metagenomic reads to mimic human microbial composition, sequencing errors, and genetic variability. We also simulated different conditions, including low and high depth of coverage and unique or multiple strains per species. Our analysis of the simulated data shows that probabilistic method-based tools such as HaplotypeCaller and Mutect2 from the GATK toolset show the best performance. By applying these tools to longitudinal gut microbiome data from the Human Microbiome Project, we show that the genetic similarity between longitudinal samples from the same individuals is significantly greater than the similarity between samples from different individuals. Our benchmark shows that probabilistic tools can be used to call metagenomes, and we recommend the use of GATK’s tools as reliable variant callers for metagenomic samples. |
format | Online Article Text |
id | pubmed-8141913 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81419132021-05-25 A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing Andreu-Sánchez, Sergio Chen, Lianmin Wang, Daoming Augustijn, Hannah E. Zhernakova, Alexandra Fu, Jingyuan Front Genet Genetics Microbes live in complex communities that are of major importance for environmental ecology, public health, and animal physiology and pathology. Short-read metagenomic shotgun sequencing is currently the state-of-the-art technique for exploring these communities. With the aid of metagenomics, our understanding of the microbiome is moving from composition toward functionality, even down to the genetic variant level. While the exploration of single-nucleotide variation in a genome is a standard procedure in genomics, and many sophisticated tools exist to perform this task, identification of genetic variation in metagenomes remains challenging. Major factors that hamper the widespread application of variant-calling analysis include low-depth sequencing of individual genomes (which is especially significant for the microorganisms present in low abundance), the existence of large genomic variation even within the same species, the absence of comprehensive reference genomes, and the noise introduced by next-generation sequencing errors. Some bioinformatics tools, such as metaSNV or InStrain, have been created to identify genetic variants in metagenomes, but the performance of these tools has not been systematically assessed or compared with the variant callers commonly used on single or pooled genomes. In this study, we benchmark seven bioinformatic tools for genetic variant calling in metagenomics data and assess their performance. To do so, we simulated metagenomic reads to mimic human microbial composition, sequencing errors, and genetic variability. We also simulated different conditions, including low and high depth of coverage and unique or multiple strains per species. Our analysis of the simulated data shows that probabilistic method-based tools such as HaplotypeCaller and Mutect2 from the GATK toolset show the best performance. By applying these tools to longitudinal gut microbiome data from the Human Microbiome Project, we show that the genetic similarity between longitudinal samples from the same individuals is significantly greater than the similarity between samples from different individuals. Our benchmark shows that probabilistic tools can be used to call metagenomes, and we recommend the use of GATK’s tools as reliable variant callers for metagenomic samples. Frontiers Media S.A. 2021-05-10 /pmc/articles/PMC8141913/ /pubmed/34040632 http://dx.doi.org/10.3389/fgene.2021.648229 Text en Copyright © 2021 Andreu-Sánchez, Chen, Wang, Augustijn, Zhernakova and Fu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Andreu-Sánchez, Sergio Chen, Lianmin Wang, Daoming Augustijn, Hannah E. Zhernakova, Alexandra Fu, Jingyuan A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing |
title | A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing |
title_full | A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing |
title_fullStr | A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing |
title_full_unstemmed | A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing |
title_short | A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing |
title_sort | benchmark of genetic variant calling pipelines using metagenomic short-read sequencing |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8141913/ https://www.ncbi.nlm.nih.gov/pubmed/34040632 http://dx.doi.org/10.3389/fgene.2021.648229 |
work_keys_str_mv | AT andreusanchezsergio abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT chenlianmin abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT wangdaoming abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT augustijnhannahe abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT zhernakovaalexandra abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT fujingyuan abenchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT andreusanchezsergio benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT chenlianmin benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT wangdaoming benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT augustijnhannahe benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT zhernakovaalexandra benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing AT fujingyuan benchmarkofgeneticvariantcallingpipelinesusingmetagenomicshortreadsequencing |