Cargando…

4P: fast computing of population genetics statistics from large DNA polymorphism panels

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the explora...

Descripción completa

Detalles Bibliográficos
Autores principales: Benazzo, Andrea, Panziera, Alex, Bertorelle, Giorgio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BlackWell Publishing Ltd 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298444/
https://www.ncbi.nlm.nih.gov/pubmed/25628874
http://dx.doi.org/10.1002/ece3.1261
_version_ 1782353269833596928
author Benazzo, Andrea
Panziera, Alex
Bertorelle, Giorgio
author_facet Benazzo, Andrea
Panziera, Alex
Bertorelle, Giorgio
author_sort Benazzo, Andrea
collection PubMed
description Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.
format Online
Article
Text
id pubmed-4298444
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BlackWell Publishing Ltd
record_format MEDLINE/PubMed
spelling pubmed-42984442015-01-27 4P: fast computing of population genetics statistics from large DNA polymorphism panels Benazzo, Andrea Panziera, Alex Bertorelle, Giorgio Ecol Evol Original Research Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations. BlackWell Publishing Ltd 2015-01 2014-12-11 /pmc/articles/PMC4298444/ /pubmed/25628874 http://dx.doi.org/10.1002/ece3.1261 Text en © 2014 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. http://creativecommons.org/licenses/by/3.0/ This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Benazzo, Andrea
Panziera, Alex
Bertorelle, Giorgio
4P: fast computing of population genetics statistics from large DNA polymorphism panels
title 4P: fast computing of population genetics statistics from large DNA polymorphism panels
title_full 4P: fast computing of population genetics statistics from large DNA polymorphism panels
title_fullStr 4P: fast computing of population genetics statistics from large DNA polymorphism panels
title_full_unstemmed 4P: fast computing of population genetics statistics from large DNA polymorphism panels
title_short 4P: fast computing of population genetics statistics from large DNA polymorphism panels
title_sort 4p: fast computing of population genetics statistics from large dna polymorphism panels
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298444/
https://www.ncbi.nlm.nih.gov/pubmed/25628874
http://dx.doi.org/10.1002/ece3.1261
work_keys_str_mv AT benazzoandrea 4pfastcomputingofpopulationgeneticsstatisticsfromlargednapolymorphismpanels
AT panzieraalex 4pfastcomputingofpopulationgeneticsstatisticsfromlargednapolymorphismpanels
AT bertorellegiorgio 4pfastcomputingofpopulationgeneticsstatisticsfromlargednapolymorphismpanels