Cargando…

Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species

BACKGROUND: For many types of analyses, data about gene structure and locations of non-coding regions of genes are required. Although a vast amount of genomic sequence data is available, precise annotation of genes is lacking behind. Finding the corresponding gene of a given protein sequence by mean...

Descripción completa

Detalles Bibliográficos
Autores principales: Keller, Oliver, Odronitz, Florian, Stanke, Mario, Kollmar, Martin, Waack, Stephan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2442105/
https://www.ncbi.nlm.nih.gov/pubmed/18554390
http://dx.doi.org/10.1186/1471-2105-9-278
_version_ 1782156674701721600
author Keller, Oliver
Odronitz, Florian
Stanke, Mario
Kollmar, Martin
Waack, Stephan
author_facet Keller, Oliver
Odronitz, Florian
Stanke, Mario
Kollmar, Martin
Waack, Stephan
author_sort Keller, Oliver
collection PubMed
description BACKGROUND: For many types of analyses, data about gene structure and locations of non-coding regions of genes are required. Although a vast amount of genomic sequence data is available, precise annotation of genes is lacking behind. Finding the corresponding gene of a given protein sequence by means of conventional tools is error prone, and cannot be completed without manual inspection, which is time consuming and requires considerable experience. RESULTS: Scipio is a tool based on the alignment program BLAT to determine the precise gene structure given a protein sequence and a genome sequence. It identifies intron-exon borders and splice sites and is able to cope with sequencing errors and genes spanning several contigs in genomes that have not yet been assembled to supercontigs or chromosomes. Instead of producing a set of hits with varying confidence, Scipio gives the user a coherent summary of locations on the genome that code for the query protein. The output contains information about discrepancies that may result from sequencing errors. Scipio has also successfully been used to find homologous genes in closely related species. Scipio was tested with 979 protein queries against 16 arthropod genomes (intra species search). For cross-species annotation, Scipio was used to annotate 40 genes from Homo sapiens in the primates Pongo pygmaeus abelii and Callithrix jacchus. The prediction quality of Scipio was tested in a comparative study against that of BLAT and the well established program Exonerate. CONCLUSION: Scipio is able to precisely map a protein query onto a genome. Even in cases when there are many sequencing errors, or when incomplete genome assemblies lead to hits that stretch across multiple target sequences, it very often provides the user with the correct determination of intron-exon borders and splice sites, showing an improved prediction accuracy compared to BLAT and Exonerate. Apart from being able to find genes in the genome that encode the query protein, Scipio can also be used to annotate genes in closely related species.
format Text
id pubmed-2442105
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24421052008-07-01 Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species Keller, Oliver Odronitz, Florian Stanke, Mario Kollmar, Martin Waack, Stephan BMC Bioinformatics Software BACKGROUND: For many types of analyses, data about gene structure and locations of non-coding regions of genes are required. Although a vast amount of genomic sequence data is available, precise annotation of genes is lacking behind. Finding the corresponding gene of a given protein sequence by means of conventional tools is error prone, and cannot be completed without manual inspection, which is time consuming and requires considerable experience. RESULTS: Scipio is a tool based on the alignment program BLAT to determine the precise gene structure given a protein sequence and a genome sequence. It identifies intron-exon borders and splice sites and is able to cope with sequencing errors and genes spanning several contigs in genomes that have not yet been assembled to supercontigs or chromosomes. Instead of producing a set of hits with varying confidence, Scipio gives the user a coherent summary of locations on the genome that code for the query protein. The output contains information about discrepancies that may result from sequencing errors. Scipio has also successfully been used to find homologous genes in closely related species. Scipio was tested with 979 protein queries against 16 arthropod genomes (intra species search). For cross-species annotation, Scipio was used to annotate 40 genes from Homo sapiens in the primates Pongo pygmaeus abelii and Callithrix jacchus. The prediction quality of Scipio was tested in a comparative study against that of BLAT and the well established program Exonerate. CONCLUSION: Scipio is able to precisely map a protein query onto a genome. Even in cases when there are many sequencing errors, or when incomplete genome assemblies lead to hits that stretch across multiple target sequences, it very often provides the user with the correct determination of intron-exon borders and splice sites, showing an improved prediction accuracy compared to BLAT and Exonerate. Apart from being able to find genes in the genome that encode the query protein, Scipio can also be used to annotate genes in closely related species. BioMed Central 2008-06-13 /pmc/articles/PMC2442105/ /pubmed/18554390 http://dx.doi.org/10.1186/1471-2105-9-278 Text en Copyright © 2008 Keller et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Keller, Oliver
Odronitz, Florian
Stanke, Mario
Kollmar, Martin
Waack, Stephan
Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
title Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
title_full Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
title_fullStr Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
title_full_unstemmed Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
title_short Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
title_sort scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2442105/
https://www.ncbi.nlm.nih.gov/pubmed/18554390
http://dx.doi.org/10.1186/1471-2105-9-278
work_keys_str_mv AT kelleroliver scipiousingproteinsequencestodeterminethepreciseexonintronstructuresofgenesandtheirorthologsincloselyrelatedspecies
AT odronitzflorian scipiousingproteinsequencestodeterminethepreciseexonintronstructuresofgenesandtheirorthologsincloselyrelatedspecies
AT stankemario scipiousingproteinsequencestodeterminethepreciseexonintronstructuresofgenesandtheirorthologsincloselyrelatedspecies
AT kollmarmartin scipiousingproteinsequencestodeterminethepreciseexonintronstructuresofgenesandtheirorthologsincloselyrelatedspecies
AT waackstephan scipiousingproteinsequencestodeterminethepreciseexonintronstructuresofgenesandtheirorthologsincloselyrelatedspecies