Cargando…

Comparison of methods for genomic localization of gene trap sequences

BACKGROUND: Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which re...

Descripción completa

Detalles Bibliográficos
Autores principales: Harper, Courtney A, Huang, Conrad C, Stryke, Doug, Kawamoto, Michiko, Ferrin, Thomas E, Babbitt, Patricia C
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1617135/
https://www.ncbi.nlm.nih.gov/pubmed/16982004
http://dx.doi.org/10.1186/1471-2164-7-236
_version_ 1782130512356179968
author Harper, Courtney A
Huang, Conrad C
Stryke, Doug
Kawamoto, Michiko
Ferrin, Thomas E
Babbitt, Patricia C
author_facet Harper, Courtney A
Huang, Conrad C
Stryke, Doug
Kawamoto, Michiko
Ferrin, Thomas E
Babbitt, Patricia C
author_sort Harper, Courtney A
collection PubMed
description BACKGROUND: Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results. RESULTS: In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. CONCLUSION: The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.
format Text
id pubmed-1617135
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16171352006-10-20 Comparison of methods for genomic localization of gene trap sequences Harper, Courtney A Huang, Conrad C Stryke, Doug Kawamoto, Michiko Ferrin, Thomas E Babbitt, Patricia C BMC Genomics Research Article BACKGROUND: Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results. RESULTS: In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. CONCLUSION: The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular. BioMed Central 2006-09-18 /pmc/articles/PMC1617135/ /pubmed/16982004 http://dx.doi.org/10.1186/1471-2164-7-236 Text en Copyright © 2006 Harper et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Harper, Courtney A
Huang, Conrad C
Stryke, Doug
Kawamoto, Michiko
Ferrin, Thomas E
Babbitt, Patricia C
Comparison of methods for genomic localization of gene trap sequences
title Comparison of methods for genomic localization of gene trap sequences
title_full Comparison of methods for genomic localization of gene trap sequences
title_fullStr Comparison of methods for genomic localization of gene trap sequences
title_full_unstemmed Comparison of methods for genomic localization of gene trap sequences
title_short Comparison of methods for genomic localization of gene trap sequences
title_sort comparison of methods for genomic localization of gene trap sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1617135/
https://www.ncbi.nlm.nih.gov/pubmed/16982004
http://dx.doi.org/10.1186/1471-2164-7-236
work_keys_str_mv AT harpercourtneya comparisonofmethodsforgenomiclocalizationofgenetrapsequences
AT huangconradc comparisonofmethodsforgenomiclocalizationofgenetrapsequences
AT strykedoug comparisonofmethodsforgenomiclocalizationofgenetrapsequences
AT kawamotomichiko comparisonofmethodsforgenomiclocalizationofgenetrapsequences
AT ferrinthomase comparisonofmethodsforgenomiclocalizationofgenetrapsequences
AT babbittpatriciac comparisonofmethodsforgenomiclocalizationofgenetrapsequences