Cargando…
Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens
Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803190/ https://www.ncbi.nlm.nih.gov/pubmed/35100292 http://dx.doi.org/10.1371/journal.pone.0262574 |
_version_ | 1784642820088266752 |
---|---|
author | Liu, Jing Shen, Qingmiao Bao, Haigang |
author_facet | Liu, Jing Shen, Qingmiao Bao, Haigang |
author_sort | Liu, Jing |
collection | PubMed |
description | Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting the appropriated SNP calling pipeline to handle with high-throughput NGS data. To fill this gap, we studied and compared seven SNP calling pipelines, which include 16GT, genome analysis toolkit (GATK), Bcftools-single (Bcftools single sample mode), Bcftools-multiple (Bcftools multiple sample mode), VarScan2-single (VarScan2 single sample mode), VarScan2-multiple (VarScan2 multiple sample mode) and Freebayes pipelines, using 96 NGS data with the different depth gradients of approximately 5X, 10X, 20X, 30X, 40X, and 50X coverage from 16 Rhode Island Red chickens. The sixteen chickens were also genotyped with a 50K SNP array, and the sensitivity and specificity of each pipeline were assessed by comparison to the results of SNP arrays. For each pipeline, except Freebayes, the number of detected SNPs increased as the input read depth increased. In comparison with other pipelines, 16GT, followed by Bcftools-multiple, obtained the most SNPs when the input coverage exceeded 10X, and Bcftools-multiple obtained the most when the input was 5X and 10X. The sensitivity and specificity of each pipeline increased with increasing input. Bcftools-multiple had the highest sensitivity numerically when the input ranged from 5X to 30X, and 16GT showed the highest sensitivity when the input was 40X and 50X. Bcftools-multiple also had the highest specificity, followed by GATK, at almost all input levels. For most calling pipelines, there were no obvious changes in SNP numbers, sensitivities or specificities beyond 20X. In conclusion, (1) if only SNPs were detected, the sequencing depth did not need to exceed 20X; (2) the Bcftools-multiple may be the best choice for detecting SNPs from chicken NGS data, but for a single sample or sequencing depth greater than 20X, 16GT was recommended. Our findings provide a reference for researchers to select suitable pipelines to obtain SNPs from the NGS data of chickens or nonhuman animals. |
format | Online Article Text |
id | pubmed-8803190 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-88031902022-02-01 Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens Liu, Jing Shen, Qingmiao Bao, Haigang PLoS One Research Article Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting the appropriated SNP calling pipeline to handle with high-throughput NGS data. To fill this gap, we studied and compared seven SNP calling pipelines, which include 16GT, genome analysis toolkit (GATK), Bcftools-single (Bcftools single sample mode), Bcftools-multiple (Bcftools multiple sample mode), VarScan2-single (VarScan2 single sample mode), VarScan2-multiple (VarScan2 multiple sample mode) and Freebayes pipelines, using 96 NGS data with the different depth gradients of approximately 5X, 10X, 20X, 30X, 40X, and 50X coverage from 16 Rhode Island Red chickens. The sixteen chickens were also genotyped with a 50K SNP array, and the sensitivity and specificity of each pipeline were assessed by comparison to the results of SNP arrays. For each pipeline, except Freebayes, the number of detected SNPs increased as the input read depth increased. In comparison with other pipelines, 16GT, followed by Bcftools-multiple, obtained the most SNPs when the input coverage exceeded 10X, and Bcftools-multiple obtained the most when the input was 5X and 10X. The sensitivity and specificity of each pipeline increased with increasing input. Bcftools-multiple had the highest sensitivity numerically when the input ranged from 5X to 30X, and 16GT showed the highest sensitivity when the input was 40X and 50X. Bcftools-multiple also had the highest specificity, followed by GATK, at almost all input levels. For most calling pipelines, there were no obvious changes in SNP numbers, sensitivities or specificities beyond 20X. In conclusion, (1) if only SNPs were detected, the sequencing depth did not need to exceed 20X; (2) the Bcftools-multiple may be the best choice for detecting SNPs from chicken NGS data, but for a single sample or sequencing depth greater than 20X, 16GT was recommended. Our findings provide a reference for researchers to select suitable pipelines to obtain SNPs from the NGS data of chickens or nonhuman animals. Public Library of Science 2022-01-31 /pmc/articles/PMC8803190/ /pubmed/35100292 http://dx.doi.org/10.1371/journal.pone.0262574 Text en © 2022 Liu et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Liu, Jing Shen, Qingmiao Bao, Haigang Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens |
title | Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens |
title_full | Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens |
title_fullStr | Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens |
title_full_unstemmed | Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens |
title_short | Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens |
title_sort | comparison of seven snp calling pipelines for the next-generation sequencing data of chickens |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803190/ https://www.ncbi.nlm.nih.gov/pubmed/35100292 http://dx.doi.org/10.1371/journal.pone.0262574 |
work_keys_str_mv | AT liujing comparisonofsevensnpcallingpipelinesforthenextgenerationsequencingdataofchickens AT shenqingmiao comparisonofsevensnpcallingpipelinesforthenextgenerationsequencingdataofchickens AT baohaigang comparisonofsevensnpcallingpipelinesforthenextgenerationsequencingdataofchickens |