Cargando…
The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2566872/ https://www.ncbi.nlm.nih.gov/pubmed/18796526 http://dx.doi.org/10.1093/nar/gkn579 |
_version_ | 1782159965588291584 |
---|---|
author | Frith, Martin C. Park, Yonil Sheetlin, Sergey L. Spouge, John L. |
author_facet | Frith, Martin C. Park, Yonil Sheetlin, Sergey L. Spouge, John L. |
author_sort | Frith, Martin C. |
collection | PubMed |
description | Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for identifying and eliminating it are well known. On the other hand, if only a part of the alignment is spurious, elimination is much more problematic. In practice, even the sizes and frequencies of spurious subalignments remain unknown. This article shows that some common scoring schemes tend to overextend alignments and generate spurious alignment flanks up to hundreds of base pairs/amino acids in length. In the UCSC genome database, e.g. spurious flanks probably comprise >18% of the human–fugu genome alignment. To evaluate the possibility that chance alone generated a particular flank on a particular pairwise alignment, we provide a simple ‘overalignment’ P-value. The overalignment P-value can identify spurious alignment flanks, thereby eliminating potentially misleading inferences about evolution and function. Moreover, by explicitly demonstrating the tradeoff between over- and under-alignment, our methods guide the rational choice of scoring schemes for various alignment tasks. |
format | Text |
id | pubmed-2566872 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-25668722008-10-17 The whole alignment and nothing but the alignment: the problem of spurious alignment flanks Frith, Martin C. Park, Yonil Sheetlin, Sergey L. Spouge, John L. Nucleic Acids Res Computational Biology Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for identifying and eliminating it are well known. On the other hand, if only a part of the alignment is spurious, elimination is much more problematic. In practice, even the sizes and frequencies of spurious subalignments remain unknown. This article shows that some common scoring schemes tend to overextend alignments and generate spurious alignment flanks up to hundreds of base pairs/amino acids in length. In the UCSC genome database, e.g. spurious flanks probably comprise >18% of the human–fugu genome alignment. To evaluate the possibility that chance alone generated a particular flank on a particular pairwise alignment, we provide a simple ‘overalignment’ P-value. The overalignment P-value can identify spurious alignment flanks, thereby eliminating potentially misleading inferences about evolution and function. Moreover, by explicitly demonstrating the tradeoff between over- and under-alignment, our methods guide the rational choice of scoring schemes for various alignment tasks. Oxford University Press 2008-10 2008-09-16 /pmc/articles/PMC2566872/ /pubmed/18796526 http://dx.doi.org/10.1093/nar/gkn579 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Frith, Martin C. Park, Yonil Sheetlin, Sergey L. Spouge, John L. The whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
title | The whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
title_full | The whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
title_fullStr | The whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
title_full_unstemmed | The whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
title_short | The whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
title_sort | whole alignment and nothing but the alignment: the problem of spurious alignment flanks |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2566872/ https://www.ncbi.nlm.nih.gov/pubmed/18796526 http://dx.doi.org/10.1093/nar/gkn579 |
work_keys_str_mv | AT frithmartinc thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT parkyonil thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT sheetlinsergeyl thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT spougejohnl thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT frithmartinc wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT parkyonil wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT sheetlinsergeyl wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks AT spougejohnl wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks |