Cargando…

Parameters for accurate genome alignment

BACKGROUND: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Frith, Martin C, Hamada, Michiaki, Horton, Paul
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829014/
https://www.ncbi.nlm.nih.gov/pubmed/20144198
http://dx.doi.org/10.1186/1471-2105-11-80
_version_ 1782178059912216576
author Frith, Martin C
Hamada, Michiaki
Horton, Paul
author_facet Frith, Martin C
Hamada, Michiaki
Horton, Paul
author_sort Frith, Martin C
collection PubMed
description BACKGROUND: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. RESULTS: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases. CONCLUSIONS: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.
format Text
id pubmed-2829014
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28290142010-02-26 Parameters for accurate genome alignment Frith, Martin C Hamada, Michiaki Horton, Paul BMC Bioinformatics Research article BACKGROUND: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. RESULTS: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases. CONCLUSIONS: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/. BioMed Central 2010-02-09 /pmc/articles/PMC2829014/ /pubmed/20144198 http://dx.doi.org/10.1186/1471-2105-11-80 Text en Copyright ©2010 Frith et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Frith, Martin C
Hamada, Michiaki
Horton, Paul
Parameters for accurate genome alignment
title Parameters for accurate genome alignment
title_full Parameters for accurate genome alignment
title_fullStr Parameters for accurate genome alignment
title_full_unstemmed Parameters for accurate genome alignment
title_short Parameters for accurate genome alignment
title_sort parameters for accurate genome alignment
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829014/
https://www.ncbi.nlm.nih.gov/pubmed/20144198
http://dx.doi.org/10.1186/1471-2105-11-80
work_keys_str_mv AT frithmartinc parametersforaccurategenomealignment
AT hamadamichiaki parametersforaccurategenomealignment
AT hortonpaul parametersforaccurategenomealignment