Cargando…

Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data

BACKGROUND: A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima’s D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Korneliussen, Thorfinn Sand, Moltke, Ida, Albrechtsen, Anders, Nielsen, Rasmus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015034/
https://www.ncbi.nlm.nih.gov/pubmed/24088262
http://dx.doi.org/10.1186/1471-2105-14-289
_version_ 1782315276046434304
author Korneliussen, Thorfinn Sand
Moltke, Ida
Albrechtsen, Anders
Nielsen, Rasmus
author_facet Korneliussen, Thorfinn Sand
Moltke, Ida
Albrechtsen, Anders
Nielsen, Rasmus
author_sort Korneliussen, Thorfinn Sand
collection PubMed
description BACKGROUND: A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima’s D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions. RESULTS: We have developed an approach that accommodates the uncertainty of the data when calculating site frequency based neutrality test statistics. A salient feature of this approach is that it implicitly solves the problems of varying sequencing depth, missing data and avoids the need to infer variable sites for the analysis and thereby avoids ascertainment problems introduced by a SNP discovery process. CONCLUSION: Using an empirical Bayes approach for fast computations, we show that this method produces results for low-coverage NGS data comparable to those achieved when the genotypes are known without uncertainty. We also validate the method in an analysis of data from the 1000 genomes project. The method is implemented in a fast framework which enables researchers to perform these neutrality tests on a genome-wide scale.
format Online
Article
Text
id pubmed-4015034
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40150342014-05-23 Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data Korneliussen, Thorfinn Sand Moltke, Ida Albrechtsen, Anders Nielsen, Rasmus BMC Bioinformatics Methodology Article BACKGROUND: A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima’s D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions. RESULTS: We have developed an approach that accommodates the uncertainty of the data when calculating site frequency based neutrality test statistics. A salient feature of this approach is that it implicitly solves the problems of varying sequencing depth, missing data and avoids the need to infer variable sites for the analysis and thereby avoids ascertainment problems introduced by a SNP discovery process. CONCLUSION: Using an empirical Bayes approach for fast computations, we show that this method produces results for low-coverage NGS data comparable to those achieved when the genotypes are known without uncertainty. We also validate the method in an analysis of data from the 1000 genomes project. The method is implemented in a fast framework which enables researchers to perform these neutrality tests on a genome-wide scale. BioMed Central 2013-10-02 /pmc/articles/PMC4015034/ /pubmed/24088262 http://dx.doi.org/10.1186/1471-2105-14-289 Text en Copyright © 2013 Korneliussen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Korneliussen, Thorfinn Sand
Moltke, Ida
Albrechtsen, Anders
Nielsen, Rasmus
Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data
title Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data
title_full Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data
title_fullStr Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data
title_full_unstemmed Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data
title_short Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data
title_sort calculation of tajima’s d and other neutrality test statistics from low depth next-generation sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015034/
https://www.ncbi.nlm.nih.gov/pubmed/24088262
http://dx.doi.org/10.1186/1471-2105-14-289
work_keys_str_mv AT korneliussenthorfinnsand calculationoftajimasdandotherneutralityteststatisticsfromlowdepthnextgenerationsequencingdata
AT moltkeida calculationoftajimasdandotherneutralityteststatisticsfromlowdepthnextgenerationsequencingdata
AT albrechtsenanders calculationoftajimasdandotherneutralityteststatisticsfromlowdepthnextgenerationsequencingdata
AT nielsenrasmus calculationoftajimasdandotherneutralityteststatisticsfromlowdepthnextgenerationsequencingdata