Cargando…

A new measurement of sequence conservation

BACKGROUND: Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Xiaohui, Hu, Haiyan, Li, Xiaoman
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2807881/
https://www.ncbi.nlm.nih.gov/pubmed/20028539
http://dx.doi.org/10.1186/1471-2164-10-623
_version_ 1782176438936403968
author Cai, Xiaohui
Hu, Haiyan
Li, Xiaoman
author_facet Cai, Xiaohui
Hu, Haiyan
Li, Xiaoman
author_sort Cai, Xiaohui
collection PubMed
description BACKGROUND: Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments. RESULTS: To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions. CONCLUSIONS: It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes [1].
format Text
id pubmed-2807881
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28078812010-01-19 A new measurement of sequence conservation Cai, Xiaohui Hu, Haiyan Li, Xiaoman BMC Genomics Research article BACKGROUND: Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments. RESULTS: To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions. CONCLUSIONS: It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes [1]. BioMed Central 2009-12-22 /pmc/articles/PMC2807881/ /pubmed/20028539 http://dx.doi.org/10.1186/1471-2164-10-623 Text en Copyright ©2009 Cai et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Cai, Xiaohui
Hu, Haiyan
Li, Xiaoman
A new measurement of sequence conservation
title A new measurement of sequence conservation
title_full A new measurement of sequence conservation
title_fullStr A new measurement of sequence conservation
title_full_unstemmed A new measurement of sequence conservation
title_short A new measurement of sequence conservation
title_sort new measurement of sequence conservation
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2807881/
https://www.ncbi.nlm.nih.gov/pubmed/20028539
http://dx.doi.org/10.1186/1471-2164-10-623
work_keys_str_mv AT caixiaohui anewmeasurementofsequenceconservation
AT huhaiyan anewmeasurementofsequenceconservation
AT lixiaoman anewmeasurementofsequenceconservation
AT caixiaohui newmeasurementofsequenceconservation
AT huhaiyan newmeasurementofsequenceconservation
AT lixiaoman newmeasurementofsequenceconservation