Cargando…

Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model

It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Her...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lunter, Gerton, Ponting, Chris P, Hein, Jotun
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1326222/ https://www.ncbi.nlm.nih.gov/pubmed/16410828 http://dx.doi.org/10.1371/journal.pcbi.0020005

_version_	1782126497457242112
author	Lunter, Gerton Ponting, Chris P Hein, Jotun
author_facet	Lunter, Gerton Ponting, Chris P Hein, Jotun
author_sort	Lunter, Gerton
collection	PubMed
description	It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.
format	Text
id	pubmed-1326222
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-13262222006-01-13 Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model Lunter, Gerton Ponting, Chris P Hein, Jotun PLoS Comput Biol Research Article It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes. Public Library of Science 2006-01 2006-01-13 /pmc/articles/PMC1326222/ /pubmed/16410828 http://dx.doi.org/10.1371/journal.pcbi.0020005 Text en © 2006 Lunter et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Lunter, Gerton Ponting, Chris P Hein, Jotun Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
title	Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
title_full	Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
title_fullStr	Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
title_full_unstemmed	Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
title_short	Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
title_sort	genome-wide identification of human functional dna using a neutral indel model
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1326222/ https://www.ncbi.nlm.nih.gov/pubmed/16410828 http://dx.doi.org/10.1371/journal.pcbi.0020005
work_keys_str_mv	AT luntergerton genomewideidentificationofhumanfunctionaldnausinganeutralindelmodel AT pontingchrisp genomewideidentificationofhumanfunctionaldnausinganeutralindelmodel AT heinjotun genomewideidentificationofhumanfunctionaldnausinganeutralindelmodel

Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model

Ejemplares similares