Cargando…
Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties
BACKGROUND: Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search me...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2048517/ https://www.ncbi.nlm.nih.gov/pubmed/17888146 http://dx.doi.org/10.1186/1471-2105-8-356 |
_version_ | 1782137156627595264 |
---|---|
author | Boekhorst, Jos Snel, Berend |
author_facet | Boekhorst, Jos Snel, Berend |
author_sort | Boekhorst, Jos |
collection | PubMed |
description | BACKGROUND: Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure. RESULTS: We test the hypothesis that extrinsic gene properties gene length and gene order can be of help in differentiating spurious sequence similarity from homology in the gray zone. Sharing gene order and similarity in size dramatically increase the chance of a query-hit pair being homologous: gray zone query-hit pairs of similar size and with conserved gene order are homologous in 99% of all cases, while for query-hit pairs without gene order conservation and with different sizes this is only 55%. CONCLUSION: We have shown that using gene length and gene order drastically improves the detection of homologs within the BLAST gray zone. Our findings suggest that the use of such extrinsic gene properties can also improve the performance of homology detection by more advanced methods, and our study thereby underscores the importance of true data integration for fully exploiting genomic information. |
format | Text |
id | pubmed-2048517 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-20485172007-11-01 Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties Boekhorst, Jos Snel, Berend BMC Bioinformatics Research Article BACKGROUND: Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure. RESULTS: We test the hypothesis that extrinsic gene properties gene length and gene order can be of help in differentiating spurious sequence similarity from homology in the gray zone. Sharing gene order and similarity in size dramatically increase the chance of a query-hit pair being homologous: gray zone query-hit pairs of similar size and with conserved gene order are homologous in 99% of all cases, while for query-hit pairs without gene order conservation and with different sizes this is only 55%. CONCLUSION: We have shown that using gene length and gene order drastically improves the detection of homologs within the BLAST gray zone. Our findings suggest that the use of such extrinsic gene properties can also improve the performance of homology detection by more advanced methods, and our study thereby underscores the importance of true data integration for fully exploiting genomic information. BioMed Central 2007-09-21 /pmc/articles/PMC2048517/ /pubmed/17888146 http://dx.doi.org/10.1186/1471-2105-8-356 Text en Copyright © 2007 Boekhorst and Snel; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Boekhorst, Jos Snel, Berend Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
title | Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
title_full | Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
title_fullStr | Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
title_full_unstemmed | Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
title_short | Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
title_sort | identification of homologs in insignificant blast hits by exploiting extrinsic gene properties |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2048517/ https://www.ncbi.nlm.nih.gov/pubmed/17888146 http://dx.doi.org/10.1186/1471-2105-8-356 |
work_keys_str_mv | AT boekhorstjos identificationofhomologsininsignificantblasthitsbyexploitingextrinsicgeneproperties AT snelberend identificationofhomologsininsignificantblasthitsbyexploitingextrinsicgeneproperties |