Cargando…
AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts
BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4640205/ https://www.ncbi.nlm.nih.gov/pubmed/26553107 http://dx.doi.org/10.1186/s12859-015-0813-8 |
_version_ | 1782400047434956800 |
---|---|
author | Evans, Teri Loose, Matthew |
author_facet | Evans, Teri Loose, Matthew |
author_sort | Evans, Teri |
collection | PubMed |
description | BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but often do not report back DNA sequences required for subsequent phylogenetic analysis. Amongst those that do, the Genewise algorithm is the most effective. However, it requires a homology wrapper to be used in this way, and here we demonstrate it perfectly corrects frame-shifts only 60 % of the time. RESULTS: We therefore created AlignWise, a tool that combines Genewise with our own homology-based method, AlignFS, to identify protein-coding regions and correct erroneous frame-shifts, suitable for subsequent phylogenetic analysis. We compared AlignWise against other open reading frame finding software and demonstrate that the AlignFS algorithm is more accurate than Genewise at correcting frame-shifts within an order. We show that AlignWise provides the greatest accuracy at higher evolutionary distances, out-performing both AlignFS and Genewise individually. CONCLUSIONS: AlignWise produces a single ORF per transcript and identifies and corrects frame-shifts with high accuracy. It is therefore well suited for analysing novel transcriptome assemblies and EST sequences in the absence of a reference genome. |
format | Online Article Text |
id | pubmed-4640205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46402052015-11-11 AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts Evans, Teri Loose, Matthew BMC Bioinformatics Software BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but often do not report back DNA sequences required for subsequent phylogenetic analysis. Amongst those that do, the Genewise algorithm is the most effective. However, it requires a homology wrapper to be used in this way, and here we demonstrate it perfectly corrects frame-shifts only 60 % of the time. RESULTS: We therefore created AlignWise, a tool that combines Genewise with our own homology-based method, AlignFS, to identify protein-coding regions and correct erroneous frame-shifts, suitable for subsequent phylogenetic analysis. We compared AlignWise against other open reading frame finding software and demonstrate that the AlignFS algorithm is more accurate than Genewise at correcting frame-shifts within an order. We show that AlignWise provides the greatest accuracy at higher evolutionary distances, out-performing both AlignFS and Genewise individually. CONCLUSIONS: AlignWise produces a single ORF per transcript and identifies and corrects frame-shifts with high accuracy. It is therefore well suited for analysing novel transcriptome assemblies and EST sequences in the absence of a reference genome. BioMed Central 2015-11-09 /pmc/articles/PMC4640205/ /pubmed/26553107 http://dx.doi.org/10.1186/s12859-015-0813-8 Text en © Evans and Loose. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Evans, Teri Loose, Matthew AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts |
title | AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts |
title_full | AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts |
title_fullStr | AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts |
title_full_unstemmed | AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts |
title_short | AlignWise: a tool for identifying protein-coding sequence and correcting frame-shifts |
title_sort | alignwise: a tool for identifying protein-coding sequence and correcting frame-shifts |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4640205/ https://www.ncbi.nlm.nih.gov/pubmed/26553107 http://dx.doi.org/10.1186/s12859-015-0813-8 |
work_keys_str_mv | AT evansteri alignwiseatoolforidentifyingproteincodingsequenceandcorrectingframeshifts AT loosematthew alignwiseatoolforidentifyingproteincodingsequenceandcorrectingframeshifts |