Cargando…

AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts

BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but...

Descripción completa

Detalles Bibliográficos
Autores principales: Evans, Teri, Loose, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4640205/
https://www.ncbi.nlm.nih.gov/pubmed/26553107
http://dx.doi.org/10.1186/s12859-015-0813-8
_version_ 1782400047434956800
author Evans, Teri
Loose, Matthew
author_facet Evans, Teri
Loose, Matthew
author_sort Evans, Teri
collection PubMed
description BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but often do not report back DNA sequences required for subsequent phylogenetic analysis. Amongst those that do, the Genewise algorithm is the most effective. However, it requires a homology wrapper to be used in this way, and here we demonstrate it perfectly corrects frame-shifts only 60 % of the time. RESULTS: We therefore created AlignWise, a tool that combines Genewise with our own homology-based method, AlignFS, to identify protein-coding regions and correct erroneous frame-shifts, suitable for subsequent phylogenetic analysis. We compared AlignWise against other open reading frame finding software and demonstrate that the AlignFS algorithm is more accurate than Genewise at correcting frame-shifts within an order. We show that AlignWise provides the greatest accuracy at higher evolutionary distances, out-performing both AlignFS and Genewise individually. CONCLUSIONS: AlignWise produces a single ORF per transcript and identifies and corrects frame-shifts with high accuracy. It is therefore well suited for analysing novel transcriptome assemblies and EST sequences in the absence of a reference genome.
format Online
Article
Text
id pubmed-4640205
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46402052015-11-11 AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts Evans, Teri Loose, Matthew BMC Bioinformatics Software BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but often do not report back DNA sequences required for subsequent phylogenetic analysis. Amongst those that do, the Genewise algorithm is the most effective. However, it requires a homology wrapper to be used in this way, and here we demonstrate it perfectly corrects frame-shifts only 60 % of the time. RESULTS: We therefore created AlignWise, a tool that combines Genewise with our own homology-based method, AlignFS, to identify protein-coding regions and correct erroneous frame-shifts, suitable for subsequent phylogenetic analysis. We compared AlignWise against other open reading frame finding software and demonstrate that the AlignFS algorithm is more accurate than Genewise at correcting frame-shifts within an order. We show that AlignWise provides the greatest accuracy at higher evolutionary distances, out-performing both AlignFS and Genewise individually. CONCLUSIONS: AlignWise produces a single ORF per transcript and identifies and corrects frame-shifts with high accuracy. It is therefore well suited for analysing novel transcriptome assemblies and EST sequences in the absence of a reference genome. BioMed Central 2015-11-09 /pmc/articles/PMC4640205/ /pubmed/26553107 http://dx.doi.org/10.1186/s12859-015-0813-8 Text en © Evans and Loose. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Evans, Teri
Loose, Matthew
AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
title AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
title_full AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
title_fullStr AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
title_full_unstemmed AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
title_short AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
title_sort alignwise: a tool for identifying protein-coding sequence and correcting frame-shifts
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4640205/
https://www.ncbi.nlm.nih.gov/pubmed/26553107
http://dx.doi.org/10.1186/s12859-015-0813-8
work_keys_str_mv AT evansteri alignwiseatoolforidentifyingproteincodingsequenceandcorrectingframeshifts
AT loosematthew alignwiseatoolforidentifyingproteincodingsequenceandcorrectingframeshifts