Cargando…

Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio

BACKGROUND: Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Hatje, Klas, Keller, Oliver, Hammesfahr, Björn, Pillmann, Holger, Waack, Stephan, Kollmar, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162530/
https://www.ncbi.nlm.nih.gov/pubmed/21798037
http://dx.doi.org/10.1186/1756-0500-4-265
_version_ 1782210822652559360
author Hatje, Klas
Keller, Oliver
Hammesfahr, Björn
Pillmann, Holger
Waack, Stephan
Kollmar, Martin
author_facet Hatje, Klas
Keller, Oliver
Hammesfahr, Björn
Pillmann, Holger
Waack, Stephan
Kollmar, Martin
author_sort Hatje, Klas
collection PubMed
description BACKGROUND: Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons. RESULTS: Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools. CONCLUSIONS: With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio outperforms all other software tested. WebScipio has been restructured and provides easy access to the genome assemblies of about 640 eukaryotic species. Scipio and WebScipio are freely accessible at http://www.webscipio.org.
format Online
Article
Text
id pubmed-3162530
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31625302011-08-27 Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio Hatje, Klas Keller, Oliver Hammesfahr, Björn Pillmann, Holger Waack, Stephan Kollmar, Martin BMC Res Notes Research Article BACKGROUND: Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons. RESULTS: Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools. CONCLUSIONS: With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio outperforms all other software tested. WebScipio has been restructured and provides easy access to the genome assemblies of about 640 eukaryotic species. Scipio and WebScipio are freely accessible at http://www.webscipio.org. BioMed Central 2011-07-28 /pmc/articles/PMC3162530/ /pubmed/21798037 http://dx.doi.org/10.1186/1756-0500-4-265 Text en Copyright ©2011 Kollmar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hatje, Klas
Keller, Oliver
Hammesfahr, Björn
Pillmann, Holger
Waack, Stephan
Kollmar, Martin
Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio
title Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio
title_full Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio
title_fullStr Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio
title_full_unstemmed Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio
title_short Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio
title_sort cross-species protein sequence and gene structure prediction with fine-tuned webscipio 2.0 and scipio
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162530/
https://www.ncbi.nlm.nih.gov/pubmed/21798037
http://dx.doi.org/10.1186/1756-0500-4-265
work_keys_str_mv AT hatjeklas crossspeciesproteinsequenceandgenestructurepredictionwithfinetunedwebscipio20andscipio
AT kelleroliver crossspeciesproteinsequenceandgenestructurepredictionwithfinetunedwebscipio20andscipio
AT hammesfahrbjorn crossspeciesproteinsequenceandgenestructurepredictionwithfinetunedwebscipio20andscipio
AT pillmannholger crossspeciesproteinsequenceandgenestructurepredictionwithfinetunedwebscipio20andscipio
AT waackstephan crossspeciesproteinsequenceandgenestructurepredictionwithfinetunedwebscipio20andscipio
AT kollmarmartin crossspeciesproteinsequenceandgenestructurepredictionwithfinetunedwebscipio20andscipio