Cargando…

Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index

BACKGROUND: The identification of functional regions contained in a given multiple sequence alignment constitutes one of the major challenges of comparative genomics. Several studies have focused on the identification of conserved regions and motifs. However, most of existing methods ignore the rela...

Descripción completa

Detalles Bibliográficos
Autores principales: Badescu, Dunarel, Boc, Alix, Diallo, Abdoulaye Baniré, Makarenkov, Vladimir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3271671/
https://www.ncbi.nlm.nih.gov/pubmed/22151279
http://dx.doi.org/10.1186/1471-2105-12-S9-S9
_version_ 1782222745242697728
author Badescu, Dunarel
Boc, Alix
Diallo, Abdoulaye Baniré
Makarenkov, Vladimir
author_facet Badescu, Dunarel
Boc, Alix
Diallo, Abdoulaye Baniré
Makarenkov, Vladimir
author_sort Badescu, Dunarel
collection PubMed
description BACKGROUND: The identification of functional regions contained in a given multiple sequence alignment constitutes one of the major challenges of comparative genomics. Several studies have focused on the identification of conserved regions and motifs. However, most of existing methods ignore the relationship between the functional genomic regions and the external evidence associated with the considered group of species (e.g., carcinogenicity of Human Papilloma Virus). In the past, we have proposed a method that takes into account the prior knowledge on an external evidence (e.g., carcinogenicity or invasivity of the considered organisms) and identifies genomic regions related to a specific disease. RESULTS AND CONCLUSION: We present a new algorithm for detecting genomic regions that may be associated with a disease. Two new variability functions and a bipartition optimization procedure are described. We validate and weigh our results using the Adjusted Rand Index (ARI), and thus assess to what extent the selected regions are related to carcinogenicity, invasivity, or any other species classification, given as input. The predictive power of different hit region detection functions was assessed on synthetic and real data. Our simulation results suggest that there is no a single function that provides the best results in all practical situations (e.g., monophyletic or polyphyletic evolution, and positive or negative selection), and that at least three different functions might be useful. The proposed hit region identification functions that do not benefit from the prior knowledge (i.e., carcinogenicity or invasivity of the involved organisms) can provide equivalent results than the existing functions that take advantage of such a prior knowledge. Using the new algorithm, we examined the Neisseria meningitidis FrpB gene product for invasivity and immunologic activity, and human papilloma virus (HPV) E6 oncoprotein for carcinogenicity, and confirmed some well-known molecular features, including surface exposed loops for N. meningitidis and PDZ domain for HPV.
format Online
Article
Text
id pubmed-3271671
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32716712012-02-04 Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index Badescu, Dunarel Boc, Alix Diallo, Abdoulaye Baniré Makarenkov, Vladimir BMC Bioinformatics Proceedings BACKGROUND: The identification of functional regions contained in a given multiple sequence alignment constitutes one of the major challenges of comparative genomics. Several studies have focused on the identification of conserved regions and motifs. However, most of existing methods ignore the relationship between the functional genomic regions and the external evidence associated with the considered group of species (e.g., carcinogenicity of Human Papilloma Virus). In the past, we have proposed a method that takes into account the prior knowledge on an external evidence (e.g., carcinogenicity or invasivity of the considered organisms) and identifies genomic regions related to a specific disease. RESULTS AND CONCLUSION: We present a new algorithm for detecting genomic regions that may be associated with a disease. Two new variability functions and a bipartition optimization procedure are described. We validate and weigh our results using the Adjusted Rand Index (ARI), and thus assess to what extent the selected regions are related to carcinogenicity, invasivity, or any other species classification, given as input. The predictive power of different hit region detection functions was assessed on synthetic and real data. Our simulation results suggest that there is no a single function that provides the best results in all practical situations (e.g., monophyletic or polyphyletic evolution, and positive or negative selection), and that at least three different functions might be useful. The proposed hit region identification functions that do not benefit from the prior knowledge (i.e., carcinogenicity or invasivity of the involved organisms) can provide equivalent results than the existing functions that take advantage of such a prior knowledge. Using the new algorithm, we examined the Neisseria meningitidis FrpB gene product for invasivity and immunologic activity, and human papilloma virus (HPV) E6 oncoprotein for carcinogenicity, and confirmed some well-known molecular features, including surface exposed loops for N. meningitidis and PDZ domain for HPV. BioMed Central 2011-10-05 /pmc/articles/PMC3271671/ /pubmed/22151279 http://dx.doi.org/10.1186/1471-2105-12-S9-S9 Text en Copyright ©2011 Badescu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Badescu, Dunarel
Boc, Alix
Diallo, Abdoulaye Baniré
Makarenkov, Vladimir
Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index
title Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index
title_full Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index
title_fullStr Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index
title_full_unstemmed Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index
title_short Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index
title_sort detecting genomic regions associated with a disease using variability functions and adjusted rand index
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3271671/
https://www.ncbi.nlm.nih.gov/pubmed/22151279
http://dx.doi.org/10.1186/1471-2105-12-S9-S9
work_keys_str_mv AT badescudunarel detectinggenomicregionsassociatedwithadiseaseusingvariabilityfunctionsandadjustedrandindex
AT bocalix detectinggenomicregionsassociatedwithadiseaseusingvariabilityfunctionsandadjustedrandindex
AT dialloabdoulayebanire detectinggenomicregionsassociatedwithadiseaseusingvariabilityfunctionsandadjustedrandindex
AT makarenkovvladimir detectinggenomicregionsassociatedwithadiseaseusingvariabilityfunctionsandadjustedrandindex