Cargando…
Adjusting scoring matrices to correct overextended alignments
Motivation: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local ali...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834790/ https://www.ncbi.nlm.nih.gov/pubmed/23995390 http://dx.doi.org/10.1093/bioinformatics/btt517 |
_version_ | 1782292044235931648 |
---|---|
author | Mills, Lauren J. Pearson, William R. |
author_facet | Mills, Lauren J. Pearson, William R. |
author_sort | Mills, Lauren J. |
collection | PubMed |
description | Motivation: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions. Results: We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7% of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (>33% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone. Availability: RefProtDom2 (RPD2) sequences and the FASTA software are available from http://faculty.virginia.edu/wrpearson/fasta. Contact: wrp@virginia.edu |
format | Online Article Text |
id | pubmed-3834790 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-38347902013-11-21 Adjusting scoring matrices to correct overextended alignments Mills, Lauren J. Pearson, William R. Bioinformatics Original Papers Motivation: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions. Results: We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7% of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (>33% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone. Availability: RefProtDom2 (RPD2) sequences and the FASTA software are available from http://faculty.virginia.edu/wrpearson/fasta. Contact: wrp@virginia.edu Oxford University Press 2013-12-01 2013-08-31 /pmc/articles/PMC3834790/ /pubmed/23995390 http://dx.doi.org/10.1093/bioinformatics/btt517 Text en © The Author 2013. Published by Oxford University Press. All rights reserved. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Mills, Lauren J. Pearson, William R. Adjusting scoring matrices to correct overextended alignments |
title | Adjusting scoring matrices to correct overextended alignments |
title_full | Adjusting scoring matrices to correct overextended alignments |
title_fullStr | Adjusting scoring matrices to correct overextended alignments |
title_full_unstemmed | Adjusting scoring matrices to correct overextended alignments |
title_short | Adjusting scoring matrices to correct overextended alignments |
title_sort | adjusting scoring matrices to correct overextended alignments |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834790/ https://www.ncbi.nlm.nih.gov/pubmed/23995390 http://dx.doi.org/10.1093/bioinformatics/btt517 |
work_keys_str_mv | AT millslaurenj adjustingscoringmatricestocorrectoverextendedalignments AT pearsonwilliamr adjustingscoringmatricestocorrectoverextendedalignments |