Cargando…

A Model of the Statistical Power of Comparative Genome Sequence Analysis

Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying con...

Descripción completa

Detalles Bibliográficos
Autor principal:	Eddy, Sean R
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC539325/ https://www.ncbi.nlm.nih.gov/pubmed/15660152 http://dx.doi.org/10.1371/journal.pbio.0030010

_version_	1782122083332915200
author	Eddy, Sean R
author_facet	Eddy, Sean R
author_sort	Eddy, Sean R
collection	PubMed
description	Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features.
format	Text
id	pubmed-539325
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-5393252005-01-04 A Model of the Statistical Power of Comparative Genome Sequence Analysis Eddy, Sean R PLoS Biol Research Article Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features. Public Library of Science 2005-01 2005-01-04 /pmc/articles/PMC539325/ /pubmed/15660152 http://dx.doi.org/10.1371/journal.pbio.0030010 Text en Copyright: © 2005 Sean R. Eddy. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Eddy, Sean R A Model of the Statistical Power of Comparative Genome Sequence Analysis
title	A Model of the Statistical Power of Comparative Genome Sequence Analysis
title_full	A Model of the Statistical Power of Comparative Genome Sequence Analysis
title_fullStr	A Model of the Statistical Power of Comparative Genome Sequence Analysis
title_full_unstemmed	A Model of the Statistical Power of Comparative Genome Sequence Analysis
title_short	A Model of the Statistical Power of Comparative Genome Sequence Analysis
title_sort	model of the statistical power of comparative genome sequence analysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC539325/ https://www.ncbi.nlm.nih.gov/pubmed/15660152 http://dx.doi.org/10.1371/journal.pbio.0030010
work_keys_str_mv	AT eddyseanr amodelofthestatisticalpowerofcomparativegenomesequenceanalysis AT eddyseanr modelofthestatisticalpowerofcomparativegenomesequenceanalysis

A Model of the Statistical Power of Comparative Genome Sequence Analysis

Ejemplares similares