Cargando…

Homologous over-extension: a challenge for iterative similarity searches

We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homolo...

Descripción completa

Detalles Bibliográficos
Autores principales: Gonzalez, Mileidy W., Pearson, William R.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2853128/
https://www.ncbi.nlm.nih.gov/pubmed/20064877
http://dx.doi.org/10.1093/nar/gkp1219
_version_ 1782180014931836928
author Gonzalez, Mileidy W.
Pearson, William R.
author_facet Gonzalez, Mileidy W.
Pearson, William R.
author_sort Gonzalez, Mileidy W.
collection PubMed
description We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homologous regions and HOE alignments that begin in a homologous region, but extend beyond the homology into neighboring sequence regions. When the neighboring sequence region contains a non-homologous domain, PSI-BLAST can incorporate the unrelated sequence into its position specific scoring matrix, which then finds non-homologous proteins with significant expectation values. HOE accounts for the largest fraction of the initial false positive (FP) errors, and the largest fraction of FPs at iteration 5. In searches against complete protein sequences, 5–9% of alignments at iteration 5 are non-homologous. HOE frequently begins in a partial protein domain; when partial domains are removed from the library, HOE errors decrease from 16 to 3% of weighted coverage (hard queries; 35–5% for sampled queries) and no-error searches increase from 2 to 58% weighed coverage (hard; 16–78% sampled). When HOE is reduced by not extending previously found sequences, PSI-BLAST specificity improves 4–8-fold, with little loss in sensitivity.
format Text
id pubmed-2853128
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28531282010-04-12 Homologous over-extension: a challenge for iterative similarity searches Gonzalez, Mileidy W. Pearson, William R. Nucleic Acids Res Computational Biology We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homologous regions and HOE alignments that begin in a homologous region, but extend beyond the homology into neighboring sequence regions. When the neighboring sequence region contains a non-homologous domain, PSI-BLAST can incorporate the unrelated sequence into its position specific scoring matrix, which then finds non-homologous proteins with significant expectation values. HOE accounts for the largest fraction of the initial false positive (FP) errors, and the largest fraction of FPs at iteration 5. In searches against complete protein sequences, 5–9% of alignments at iteration 5 are non-homologous. HOE frequently begins in a partial protein domain; when partial domains are removed from the library, HOE errors decrease from 16 to 3% of weighted coverage (hard queries; 35–5% for sampled queries) and no-error searches increase from 2 to 58% weighed coverage (hard; 16–78% sampled). When HOE is reduced by not extending previously found sequences, PSI-BLAST specificity improves 4–8-fold, with little loss in sensitivity. Oxford University Press 2010-04 2010-01-11 /pmc/articles/PMC2853128/ /pubmed/20064877 http://dx.doi.org/10.1093/nar/gkp1219 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Gonzalez, Mileidy W.
Pearson, William R.
Homologous over-extension: a challenge for iterative similarity searches
title Homologous over-extension: a challenge for iterative similarity searches
title_full Homologous over-extension: a challenge for iterative similarity searches
title_fullStr Homologous over-extension: a challenge for iterative similarity searches
title_full_unstemmed Homologous over-extension: a challenge for iterative similarity searches
title_short Homologous over-extension: a challenge for iterative similarity searches
title_sort homologous over-extension: a challenge for iterative similarity searches
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2853128/
https://www.ncbi.nlm.nih.gov/pubmed/20064877
http://dx.doi.org/10.1093/nar/gkp1219
work_keys_str_mv AT gonzalezmileidyw homologousoverextensionachallengeforiterativesimilaritysearches
AT pearsonwilliamr homologousoverextensionachallengeforiterativesimilaritysearches