Cargando…
New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source so...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889653/ https://www.ncbi.nlm.nih.gov/pubmed/27172192 http://dx.doi.org/10.1534/g3.116.028233 |
_version_ | 1782434996672266240 |
---|---|
author | Gao, Feng Ming, Chen Hu, Wangjie Li, Haipeng |
author_facet | Gao, Feng Ming, Chen Hu, Wangjie Li, Haipeng |
author_sort | Gao, Feng |
collection | PubMed |
description | Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source software package (FastEPRR) that uses machine learning to estimate recombination rate [Formula: see text] (= [Formula: see text]) from intraspecific DNA polymorphism data. When [Formula: see text] and the number of sampled diploid individuals is large enough ([Formula: see text]), the variance of [Formula: see text] remains slightly smaller than that of [Formula: see text]. The new estimate [Formula: see text] (calculated by averaging [Formula: see text] and [Formula: see text]) has the smallest variance of all cases. When estimating [Formula: see text] , the finite-site model was employed to analyze cases with a high rate of recurrent mutations, and an additional method is proposed to consider the effect of variable recombination rates within windows. Simulations encompassing a wide range of parameters demonstrate that different evolutionary factors, such as demography and selection, may not increase the false positive rate of recombination hotspots. Overall, accuracy of FastEPRR is similar to the well-known method, LDhat, but requires far less computation time. Genetic maps for each human population (YRI, CEU, and CHB) extracted from the 1000 Genomes OMNI data set were obtained in less than 3 d using just a single CPU core. The Pearson Pairwise correlation coefficient between the [Formula: see text] and [Formula: see text] maps is very high, ranging between 0.929 and 0.987 at a 5-Mb scale. Considering that sample sizes for these kinds of data are increasing dramatically with advances in next-generation sequencing technologies, FastEPRR (freely available at http://www.picb.ac.cn/evolgen/) is expected to become a widely used tool for establishing genetic maps and studying recombination hotspots in the population genomic era. |
format | Online Article Text |
id | pubmed-4889653 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-48896532016-06-02 New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era Gao, Feng Ming, Chen Hu, Wangjie Li, Haipeng G3 (Bethesda) Investigations Genetic recombination is a very important evolutionary mechanism that mixes parental haplotypes and produces new raw material for organismal evolution. As a result, information on recombination rates is critical for biological research. In this paper, we introduce a new extremely fast open-source software package (FastEPRR) that uses machine learning to estimate recombination rate [Formula: see text] (= [Formula: see text]) from intraspecific DNA polymorphism data. When [Formula: see text] and the number of sampled diploid individuals is large enough ([Formula: see text]), the variance of [Formula: see text] remains slightly smaller than that of [Formula: see text]. The new estimate [Formula: see text] (calculated by averaging [Formula: see text] and [Formula: see text]) has the smallest variance of all cases. When estimating [Formula: see text] , the finite-site model was employed to analyze cases with a high rate of recurrent mutations, and an additional method is proposed to consider the effect of variable recombination rates within windows. Simulations encompassing a wide range of parameters demonstrate that different evolutionary factors, such as demography and selection, may not increase the false positive rate of recombination hotspots. Overall, accuracy of FastEPRR is similar to the well-known method, LDhat, but requires far less computation time. Genetic maps for each human population (YRI, CEU, and CHB) extracted from the 1000 Genomes OMNI data set were obtained in less than 3 d using just a single CPU core. The Pearson Pairwise correlation coefficient between the [Formula: see text] and [Formula: see text] maps is very high, ranging between 0.929 and 0.987 at a 5-Mb scale. Considering that sample sizes for these kinds of data are increasing dramatically with advances in next-generation sequencing technologies, FastEPRR (freely available at http://www.picb.ac.cn/evolgen/) is expected to become a widely used tool for establishing genetic maps and studying recombination hotspots in the population genomic era. Genetics Society of America 2016-03-29 /pmc/articles/PMC4889653/ /pubmed/27172192 http://dx.doi.org/10.1534/g3.116.028233 Text en Copyright © 2016 Gao et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Gao, Feng Ming, Chen Hu, Wangjie Li, Haipeng New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era |
title | New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era |
title_full | New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era |
title_fullStr | New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era |
title_full_unstemmed | New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era |
title_short | New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era |
title_sort | new software for the fast estimation of population recombination rates (fasteprr) in the genomic era |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889653/ https://www.ncbi.nlm.nih.gov/pubmed/27172192 http://dx.doi.org/10.1534/g3.116.028233 |
work_keys_str_mv | AT gaofeng newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera AT mingchen newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera AT huwangjie newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera AT lihaipeng newsoftwareforthefastestimationofpopulationrecombinationratesfasteprrinthegenomicera |