Cargando…
An enhanced RNA alignment benchmark for sequence alignment programs
BACKGROUND: The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchm...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635699/ https://www.ncbi.nlm.nih.gov/pubmed/17062125 http://dx.doi.org/10.1186/1748-7188-1-19 |
_version_ | 1782130704771973120 |
---|---|
author | Wilm, Andreas Mainz, Indra Steger, Gerhard |
author_facet | Wilm, Andreas Mainz, Indra Steger, Gerhard |
author_sort | Wilm, Andreas |
collection | PubMed |
description | BACKGROUND: The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark. RESULTS: The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests. CONCLUSION: Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 %. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI ≤ 75 %; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI > 55 %; at lower APSI the use of sequence+structure alignment programs is recommended. |
format | Text |
id | pubmed-1635699 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-16356992006-11-14 An enhanced RNA alignment benchmark for sequence alignment programs Wilm, Andreas Mainz, Indra Steger, Gerhard Algorithms Mol Biol Research BACKGROUND: The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark. RESULTS: The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests. CONCLUSION: Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 %. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI ≤ 75 %; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI > 55 %; at lower APSI the use of sequence+structure alignment programs is recommended. BioMed Central 2006-10-24 /pmc/articles/PMC1635699/ /pubmed/17062125 http://dx.doi.org/10.1186/1748-7188-1-19 Text en Copyright © 2006 Wilm et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Wilm, Andreas Mainz, Indra Steger, Gerhard An enhanced RNA alignment benchmark for sequence alignment programs |
title | An enhanced RNA alignment benchmark for sequence alignment programs |
title_full | An enhanced RNA alignment benchmark for sequence alignment programs |
title_fullStr | An enhanced RNA alignment benchmark for sequence alignment programs |
title_full_unstemmed | An enhanced RNA alignment benchmark for sequence alignment programs |
title_short | An enhanced RNA alignment benchmark for sequence alignment programs |
title_sort | enhanced rna alignment benchmark for sequence alignment programs |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635699/ https://www.ncbi.nlm.nih.gov/pubmed/17062125 http://dx.doi.org/10.1186/1748-7188-1-19 |
work_keys_str_mv | AT wilmandreas anenhancedrnaalignmentbenchmarkforsequencealignmentprograms AT mainzindra anenhancedrnaalignmentbenchmarkforsequencealignmentprograms AT stegergerhard anenhancedrnaalignmentbenchmarkforsequencealignmentprograms AT wilmandreas enhancedrnaalignmentbenchmarkforsequencealignmentprograms AT mainzindra enhancedrnaalignmentbenchmarkforsequencealignmentprograms AT stegergerhard enhancedrnaalignmentbenchmarkforsequencealignmentprograms |