Cargando…

Universal seeds for cDNA-to-genome comparison

BACKGROUND: To meet the needs of gene annotation for newly sequenced organisms, optimized spaced seeds can be implemented into cross-species sequence alignment programs to accurately align gene sequences to the genome of a related species. So far, seed performance has been tested for comparisons bet...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Leming, Stanton, Jonathan, Florea, Liliana
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375135/
https://www.ncbi.nlm.nih.gov/pubmed/18215286
http://dx.doi.org/10.1186/1471-2105-9-36
_version_ 1782154585731760128
author Zhou, Leming
Stanton, Jonathan
Florea, Liliana
author_facet Zhou, Leming
Stanton, Jonathan
Florea, Liliana
author_sort Zhou, Leming
collection PubMed
description BACKGROUND: To meet the needs of gene annotation for newly sequenced organisms, optimized spaced seeds can be implemented into cross-species sequence alignment programs to accurately align gene sequences to the genome of a related species. So far, seed performance has been tested for comparisons between closely related species, such as human and mouse, or on simulated data. As the number and variety of genomes increases, it becomes desirable to identify a small set of universal seeds that perform optimally or near-optimally on a large range of comparisons. RESULTS: Using statistical regression methods, we investigate the sensitivity of seeds, in particular good seeds, between four cDNA-to-genome comparisons at different evolutionary distances (human-dog, human-mouse, human-chicken and human-zebrafish), and identify classes of comparisons that show similar seed behavior and therefore can employ the same seed. In addition, we find that with high confidence good seeds for more distant comparisons perform well on closer comparisons, within 98–99% of the optimal seeds, and thus represent universal good seeds. CONCLUSION: We show for the first time that optimal and near-optimal seeds for distant species-to-species comparisons are more generally applicable to a wide range of comparisons. This finding will be instrumental in developing practical and user-friendly cDNA-to-genome alignment applications, to aid in the annotation of new model organisms.
format Text
id pubmed-2375135
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23751352008-05-09 Universal seeds for cDNA-to-genome comparison Zhou, Leming Stanton, Jonathan Florea, Liliana BMC Bioinformatics Research Article BACKGROUND: To meet the needs of gene annotation for newly sequenced organisms, optimized spaced seeds can be implemented into cross-species sequence alignment programs to accurately align gene sequences to the genome of a related species. So far, seed performance has been tested for comparisons between closely related species, such as human and mouse, or on simulated data. As the number and variety of genomes increases, it becomes desirable to identify a small set of universal seeds that perform optimally or near-optimally on a large range of comparisons. RESULTS: Using statistical regression methods, we investigate the sensitivity of seeds, in particular good seeds, between four cDNA-to-genome comparisons at different evolutionary distances (human-dog, human-mouse, human-chicken and human-zebrafish), and identify classes of comparisons that show similar seed behavior and therefore can employ the same seed. In addition, we find that with high confidence good seeds for more distant comparisons perform well on closer comparisons, within 98–99% of the optimal seeds, and thus represent universal good seeds. CONCLUSION: We show for the first time that optimal and near-optimal seeds for distant species-to-species comparisons are more generally applicable to a wide range of comparisons. This finding will be instrumental in developing practical and user-friendly cDNA-to-genome alignment applications, to aid in the annotation of new model organisms. BioMed Central 2008-01-23 /pmc/articles/PMC2375135/ /pubmed/18215286 http://dx.doi.org/10.1186/1471-2105-9-36 Text en Copyright © 2008 Zhou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhou, Leming
Stanton, Jonathan
Florea, Liliana
Universal seeds for cDNA-to-genome comparison
title Universal seeds for cDNA-to-genome comparison
title_full Universal seeds for cDNA-to-genome comparison
title_fullStr Universal seeds for cDNA-to-genome comparison
title_full_unstemmed Universal seeds for cDNA-to-genome comparison
title_short Universal seeds for cDNA-to-genome comparison
title_sort universal seeds for cdna-to-genome comparison
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375135/
https://www.ncbi.nlm.nih.gov/pubmed/18215286
http://dx.doi.org/10.1186/1471-2105-9-36
work_keys_str_mv AT zhouleming universalseedsforcdnatogenomecomparison
AT stantonjonathan universalseedsforcdnatogenomecomparison
AT florealiliana universalseedsforcdnatogenomecomparison