Cargando…

A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions

BACKGROUND: The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias...

Descripción completa

Detalles Bibliográficos
Autores principales: Seo, Daekwan, Jiang, Cizhong, Zhao, Zhongming
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1769377/
https://www.ncbi.nlm.nih.gov/pubmed/17196097
http://dx.doi.org/10.1186/1471-2164-7-329
_version_ 1782131680472989696
author Seo, Daekwan
Jiang, Cizhong
Zhao, Zhongming
author_facet Seo, Daekwan
Jiang, Cizhong
Zhao, Zhongming
author_sort Seo, Daekwan
collection PubMed
description BACKGROUND: The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias patterns could be represented by a small subset of the total SNPs. It remains unsolved for the estimation of the effective SNP size, the number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data. RESULTS: To estimate the effective SNP size, we developed a novel statistical method, SNPKS, which considers both the statistical and biological significances. SNPKS consists of two major steps: to obtain an initial effective size by the Kolmogorov-Smirnov test (KS test) and to find an intermediate effective size by interval evaluation. The SNPKS algorithm was implemented in computer programs and applied to the real SNP data. The effective SNP size was estimated to be 38,200, 39,300, 38,000, and 38,700 in the human, chimpanzee, dog, and mouse genomes, respectively, and 39,100, 39,600, 39,200, and 42,200 in human intergenic, genic, intronic, and CpG island regions, respectively. CONCLUSION: SNPKS is the first statistical method to estimate the effective SNP size. It runs efficiently and greatly outperforms the algorithm implemented in SNPNB. The application of SNPKS to the real SNP data revealed the similar small effective SNP size (38,000 – 42,200) in the human, chimpanzee, dog, and mouse genomes as well as in human genomic regions. The findings suggest strong influence of genetic factors across vertebrate genomes.
format Text
id pubmed-1769377
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17693772007-01-16 A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions Seo, Daekwan Jiang, Cizhong Zhao, Zhongming BMC Genomics Methodology Article BACKGROUND: The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias patterns could be represented by a small subset of the total SNPs. It remains unsolved for the estimation of the effective SNP size, the number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data. RESULTS: To estimate the effective SNP size, we developed a novel statistical method, SNPKS, which considers both the statistical and biological significances. SNPKS consists of two major steps: to obtain an initial effective size by the Kolmogorov-Smirnov test (KS test) and to find an intermediate effective size by interval evaluation. The SNPKS algorithm was implemented in computer programs and applied to the real SNP data. The effective SNP size was estimated to be 38,200, 39,300, 38,000, and 38,700 in the human, chimpanzee, dog, and mouse genomes, respectively, and 39,100, 39,600, 39,200, and 42,200 in human intergenic, genic, intronic, and CpG island regions, respectively. CONCLUSION: SNPKS is the first statistical method to estimate the effective SNP size. It runs efficiently and greatly outperforms the algorithm implemented in SNPNB. The application of SNPKS to the real SNP data revealed the similar small effective SNP size (38,000 – 42,200) in the human, chimpanzee, dog, and mouse genomes as well as in human genomic regions. The findings suggest strong influence of genetic factors across vertebrate genomes. BioMed Central 2006-12-29 /pmc/articles/PMC1769377/ /pubmed/17196097 http://dx.doi.org/10.1186/1471-2164-7-329 Text en Copyright © 2006 Seo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Seo, Daekwan
Jiang, Cizhong
Zhao, Zhongming
A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions
title A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions
title_full A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions
title_fullStr A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions
title_full_unstemmed A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions
title_short A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions
title_sort novel statistical method to estimate the effective snp size in vertebrate genomes and categorized genomic regions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1769377/
https://www.ncbi.nlm.nih.gov/pubmed/17196097
http://dx.doi.org/10.1186/1471-2164-7-329
work_keys_str_mv AT seodaekwan anovelstatisticalmethodtoestimatetheeffectivesnpsizeinvertebrategenomesandcategorizedgenomicregions
AT jiangcizhong anovelstatisticalmethodtoestimatetheeffectivesnpsizeinvertebrategenomesandcategorizedgenomicregions
AT zhaozhongming anovelstatisticalmethodtoestimatetheeffectivesnpsizeinvertebrategenomesandcategorizedgenomicregions
AT seodaekwan novelstatisticalmethodtoestimatetheeffectivesnpsizeinvertebrategenomesandcategorizedgenomicregions
AT jiangcizhong novelstatisticalmethodtoestimatetheeffectivesnpsizeinvertebrategenomesandcategorizedgenomicregions
AT zhaozhongming novelstatisticalmethodtoestimatetheeffectivesnpsizeinvertebrategenomesandcategorizedgenomicregions