Cargando…

Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens

Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Jing, Shen, Qingmiao, Bao, Haigang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803190/
https://www.ncbi.nlm.nih.gov/pubmed/35100292
http://dx.doi.org/10.1371/journal.pone.0262574
_version_ 1784642820088266752
author Liu, Jing
Shen, Qingmiao
Bao, Haigang
author_facet Liu, Jing
Shen, Qingmiao
Bao, Haigang
author_sort Liu, Jing
collection PubMed
description Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting the appropriated SNP calling pipeline to handle with high-throughput NGS data. To fill this gap, we studied and compared seven SNP calling pipelines, which include 16GT, genome analysis toolkit (GATK), Bcftools-single (Bcftools single sample mode), Bcftools-multiple (Bcftools multiple sample mode), VarScan2-single (VarScan2 single sample mode), VarScan2-multiple (VarScan2 multiple sample mode) and Freebayes pipelines, using 96 NGS data with the different depth gradients of approximately 5X, 10X, 20X, 30X, 40X, and 50X coverage from 16 Rhode Island Red chickens. The sixteen chickens were also genotyped with a 50K SNP array, and the sensitivity and specificity of each pipeline were assessed by comparison to the results of SNP arrays. For each pipeline, except Freebayes, the number of detected SNPs increased as the input read depth increased. In comparison with other pipelines, 16GT, followed by Bcftools-multiple, obtained the most SNPs when the input coverage exceeded 10X, and Bcftools-multiple obtained the most when the input was 5X and 10X. The sensitivity and specificity of each pipeline increased with increasing input. Bcftools-multiple had the highest sensitivity numerically when the input ranged from 5X to 30X, and 16GT showed the highest sensitivity when the input was 40X and 50X. Bcftools-multiple also had the highest specificity, followed by GATK, at almost all input levels. For most calling pipelines, there were no obvious changes in SNP numbers, sensitivities or specificities beyond 20X. In conclusion, (1) if only SNPs were detected, the sequencing depth did not need to exceed 20X; (2) the Bcftools-multiple may be the best choice for detecting SNPs from chicken NGS data, but for a single sample or sequencing depth greater than 20X, 16GT was recommended. Our findings provide a reference for researchers to select suitable pipelines to obtain SNPs from the NGS data of chickens or nonhuman animals.
format Online
Article
Text
id pubmed-8803190
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-88031902022-02-01 Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens Liu, Jing Shen, Qingmiao Bao, Haigang PLoS One Research Article Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting the appropriated SNP calling pipeline to handle with high-throughput NGS data. To fill this gap, we studied and compared seven SNP calling pipelines, which include 16GT, genome analysis toolkit (GATK), Bcftools-single (Bcftools single sample mode), Bcftools-multiple (Bcftools multiple sample mode), VarScan2-single (VarScan2 single sample mode), VarScan2-multiple (VarScan2 multiple sample mode) and Freebayes pipelines, using 96 NGS data with the different depth gradients of approximately 5X, 10X, 20X, 30X, 40X, and 50X coverage from 16 Rhode Island Red chickens. The sixteen chickens were also genotyped with a 50K SNP array, and the sensitivity and specificity of each pipeline were assessed by comparison to the results of SNP arrays. For each pipeline, except Freebayes, the number of detected SNPs increased as the input read depth increased. In comparison with other pipelines, 16GT, followed by Bcftools-multiple, obtained the most SNPs when the input coverage exceeded 10X, and Bcftools-multiple obtained the most when the input was 5X and 10X. The sensitivity and specificity of each pipeline increased with increasing input. Bcftools-multiple had the highest sensitivity numerically when the input ranged from 5X to 30X, and 16GT showed the highest sensitivity when the input was 40X and 50X. Bcftools-multiple also had the highest specificity, followed by GATK, at almost all input levels. For most calling pipelines, there were no obvious changes in SNP numbers, sensitivities or specificities beyond 20X. In conclusion, (1) if only SNPs were detected, the sequencing depth did not need to exceed 20X; (2) the Bcftools-multiple may be the best choice for detecting SNPs from chicken NGS data, but for a single sample or sequencing depth greater than 20X, 16GT was recommended. Our findings provide a reference for researchers to select suitable pipelines to obtain SNPs from the NGS data of chickens or nonhuman animals. Public Library of Science 2022-01-31 /pmc/articles/PMC8803190/ /pubmed/35100292 http://dx.doi.org/10.1371/journal.pone.0262574 Text en © 2022 Liu et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Liu, Jing
Shen, Qingmiao
Bao, Haigang
Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
title Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
title_full Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
title_fullStr Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
title_full_unstemmed Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
title_short Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
title_sort comparison of seven snp calling pipelines for the next-generation sequencing data of chickens
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803190/
https://www.ncbi.nlm.nih.gov/pubmed/35100292
http://dx.doi.org/10.1371/journal.pone.0262574
work_keys_str_mv AT liujing comparisonofsevensnpcallingpipelinesforthenextgenerationsequencingdataofchickens
AT shenqingmiao comparisonofsevensnpcallingpipelinesforthenextgenerationsequencingdataofchickens
AT baohaigang comparisonofsevensnpcallingpipelinesforthenextgenerationsequencingdataofchickens