Cargando…

Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties

BACKGROUND: Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search me...

Descripción completa

Detalles Bibliográficos
Autores principales: Boekhorst, Jos, Snel, Berend
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2048517/
https://www.ncbi.nlm.nih.gov/pubmed/17888146
http://dx.doi.org/10.1186/1471-2105-8-356
_version_ 1782137156627595264
author Boekhorst, Jos
Snel, Berend
author_facet Boekhorst, Jos
Snel, Berend
author_sort Boekhorst, Jos
collection PubMed
description BACKGROUND: Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure. RESULTS: We test the hypothesis that extrinsic gene properties gene length and gene order can be of help in differentiating spurious sequence similarity from homology in the gray zone. Sharing gene order and similarity in size dramatically increase the chance of a query-hit pair being homologous: gray zone query-hit pairs of similar size and with conserved gene order are homologous in 99% of all cases, while for query-hit pairs without gene order conservation and with different sizes this is only 55%. CONCLUSION: We have shown that using gene length and gene order drastically improves the detection of homologs within the BLAST gray zone. Our findings suggest that the use of such extrinsic gene properties can also improve the performance of homology detection by more advanced methods, and our study thereby underscores the importance of true data integration for fully exploiting genomic information.
format Text
id pubmed-2048517
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20485172007-11-01 Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties Boekhorst, Jos Snel, Berend BMC Bioinformatics Research Article BACKGROUND: Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure. RESULTS: We test the hypothesis that extrinsic gene properties gene length and gene order can be of help in differentiating spurious sequence similarity from homology in the gray zone. Sharing gene order and similarity in size dramatically increase the chance of a query-hit pair being homologous: gray zone query-hit pairs of similar size and with conserved gene order are homologous in 99% of all cases, while for query-hit pairs without gene order conservation and with different sizes this is only 55%. CONCLUSION: We have shown that using gene length and gene order drastically improves the detection of homologs within the BLAST gray zone. Our findings suggest that the use of such extrinsic gene properties can also improve the performance of homology detection by more advanced methods, and our study thereby underscores the importance of true data integration for fully exploiting genomic information. BioMed Central 2007-09-21 /pmc/articles/PMC2048517/ /pubmed/17888146 http://dx.doi.org/10.1186/1471-2105-8-356 Text en Copyright © 2007 Boekhorst and Snel; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Boekhorst, Jos
Snel, Berend
Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
title Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
title_full Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
title_fullStr Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
title_full_unstemmed Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
title_short Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
title_sort identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2048517/
https://www.ncbi.nlm.nih.gov/pubmed/17888146
http://dx.doi.org/10.1186/1471-2105-8-356
work_keys_str_mv AT boekhorstjos identificationofhomologsininsignificantblasthitsbyexploitingextrinsicgeneproperties
AT snelberend identificationofhomologsininsignificantblasthitsbyexploitingextrinsicgeneproperties