Cargando…

New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era

Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source so...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Feng, Ming, Chen, Hu, Wangjie, Li, Haipeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889653/
https://www.ncbi.nlm.nih.gov/pubmed/27172192
http://dx.doi.org/10.1534/g3.116.028233
_version_ 1782434996672266240
author Gao, Feng
Ming, Chen
Hu, Wangjie
Li, Haipeng
author_facet Gao, Feng
Ming, Chen
Hu, Wangjie
Li, Haipeng
author_sort Gao, Feng
collection PubMed
description Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source software package (FastEPRR) that uses machine learning to estimate recombination rate [Formula: see text] (= [Formula: see text]) from intraspecific DNA polymorphism data. When [Formula: see text] and the number of sampled diploid individuals is large enough ([Formula: see text]), the variance of [Formula: see text] remains slightly smaller than that of [Formula: see text]. The new estimate [Formula: see text] (calculated by averaging [Formula: see text] and [Formula: see text]) has the smallest variance of all cases. When estimating [Formula: see text] , the finite-site model was employed to analyze cases with a high rate of recurrent mutations, and an additional method is proposed to consider the effect of variable recombination rates within windows. Simulations encompassing a wide range of parameters demonstrate that different evolutionary factors, such as demography and selection, may not increase the false positive rate of recombination hotspots. Overall, accuracy of FastEPRR is similar to the well-known method, LDhat, but requires far less computation time. Genetic maps for each human population (YRI, CEU, and CHB) extracted from the 1000 Genomes OMNI data set were obtained in less than 3 d using just a single CPU core. The Pearson Pairwise correlation coefficient between the [Formula: see text] and [Formula: see text] maps is very high, ranging between 0.929 and 0.987 at a 5-Mb scale. Considering that sample sizes for these kinds of data are increasing dramatically with advances in next-generation sequencing technologies, FastEPRR (freely available at http://www.picb.ac.cn/evolgen/) is expected to become a widely used tool for establishing genetic maps and studying recombination hotspots in the population genomic era.
format Online
Article
Text
id pubmed-4889653
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-48896532016-06-02 New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era Gao, Feng Ming, Chen Hu, Wangjie Li, Haipeng G3 (Bethesda) Investigations Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source software package (FastEPRR) that uses machine learning to estimate recombination rate [Formula: see text] (= [Formula: see text]) from intraspecific DNA polymorphism data. When [Formula: see text] and the number of sampled diploid individuals is large enough ([Formula: see text]), the variance of [Formula: see text] remains slightly smaller than that of [Formula: see text]. The new estimate [Formula: see text] (calculated by averaging [Formula: see text] and [Formula: see text]) has the smallest variance of all cases. When estimating [Formula: see text] , the finite-site model was employed to analyze cases with a high rate of recurrent mutations, and an additional method is proposed to consider the effect of variable recombination rates within windows. Simulations encompassing a wide range of parameters demonstrate that different evolutionary factors, such as demography and selection, may not increase the false positive rate of recombination hotspots. Overall, accuracy of FastEPRR is similar to the well-known method, LDhat, but requires far less computation time. Genetic maps for each human population (YRI, CEU, and CHB) extracted from the 1000 Genomes OMNI data set were obtained in less than 3 d using just a single CPU core. The Pearson Pairwise correlation coefficient between the [Formula: see text] and [Formula: see text] maps is very high, ranging between 0.929 and 0.987 at a 5-Mb scale. Considering that sample sizes for these kinds of data are increasing dramatically with advances in next-generation sequencing technologies, FastEPRR (freely available at http://www.picb.ac.cn/evolgen/) is expected to become a widely used tool for establishing genetic maps and studying recombination hotspots in the population genomic era. Genetics Society of America 2016-03-29 /pmc/articles/PMC4889653/ /pubmed/27172192 http://dx.doi.org/10.1534/g3.116.028233 Text en Copyright © 2016 Gao et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Gao, Feng
Ming, Chen
Hu, Wangjie
Li, Haipeng
New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
title New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
title_full New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
title_fullStr New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
title_full_unstemmed New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
title_short New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
title_sort new software for the fast estimation of population recombination rates (fasteprr) in the genomic era
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889653/
https://www.ncbi.nlm.nih.gov/pubmed/27172192
http://dx.doi.org/10.1534/g3.116.028233
work_keys_str_mv AT gaofeng newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera
AT mingchen newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera
AT huwangjie newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera
AT lihaipeng newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera