Cargando…

The whole alignment and nothing but the alignment: the problem of spurious alignment flanks

Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for...

Descripción completa

Detalles Bibliográficos
Autores principales: Frith, Martin C., Park, Yonil, Sheetlin, Sergey L., Spouge, John L.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2566872/
https://www.ncbi.nlm.nih.gov/pubmed/18796526
http://dx.doi.org/10.1093/nar/gkn579
_version_ 1782159965588291584
author Frith, Martin C.
Park, Yonil
Sheetlin, Sergey L.
Spouge, John L.
author_facet Frith, Martin C.
Park, Yonil
Sheetlin, Sergey L.
Spouge, John L.
author_sort Frith, Martin C.
collection PubMed
description Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for identifying and eliminating it are well known. On the other hand, if only a part of the alignment is spurious, elimination is much more problematic. In practice, even the sizes and frequencies of spurious subalignments remain unknown. This article shows that some common scoring schemes tend to overextend alignments and generate spurious alignment flanks up to hundreds of base pairs/amino acids in length. In the UCSC genome database, e.g. spurious flanks probably comprise >18% of the human–fugu genome alignment. To evaluate the possibility that chance alone generated a particular flank on a particular pairwise alignment, we provide a simple ‘overalignment’ P-value. The overalignment P-value can identify spurious alignment flanks, thereby eliminating potentially misleading inferences about evolution and function. Moreover, by explicitly demonstrating the tradeoff between over- and under-alignment, our methods guide the rational choice of scoring schemes for various alignment tasks.
format Text
id pubmed-2566872
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-25668722008-10-17 The whole alignment and nothing but the alignment: the problem of spurious alignment flanks Frith, Martin C. Park, Yonil Sheetlin, Sergey L. Spouge, John L. Nucleic Acids Res Computational Biology Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for identifying and eliminating it are well known. On the other hand, if only a part of the alignment is spurious, elimination is much more problematic. In practice, even the sizes and frequencies of spurious subalignments remain unknown. This article shows that some common scoring schemes tend to overextend alignments and generate spurious alignment flanks up to hundreds of base pairs/amino acids in length. In the UCSC genome database, e.g. spurious flanks probably comprise >18% of the human–fugu genome alignment. To evaluate the possibility that chance alone generated a particular flank on a particular pairwise alignment, we provide a simple ‘overalignment’ P-value. The overalignment P-value can identify spurious alignment flanks, thereby eliminating potentially misleading inferences about evolution and function. Moreover, by explicitly demonstrating the tradeoff between over- and under-alignment, our methods guide the rational choice of scoring schemes for various alignment tasks. Oxford University Press 2008-10 2008-09-16 /pmc/articles/PMC2566872/ /pubmed/18796526 http://dx.doi.org/10.1093/nar/gkn579 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Frith, Martin C.
Park, Yonil
Sheetlin, Sergey L.
Spouge, John L.
The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
title The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
title_full The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
title_fullStr The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
title_full_unstemmed The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
title_short The whole alignment and nothing but the alignment: the problem of spurious alignment flanks
title_sort whole alignment and nothing but the alignment: the problem of spurious alignment flanks
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2566872/
https://www.ncbi.nlm.nih.gov/pubmed/18796526
http://dx.doi.org/10.1093/nar/gkn579
work_keys_str_mv AT frithmartinc thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT parkyonil thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT sheetlinsergeyl thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT spougejohnl thewholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT frithmartinc wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT parkyonil wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT sheetlinsergeyl wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks
AT spougejohnl wholealignmentandnothingbutthealignmenttheproblemofspuriousalignmentflanks