Cargando…

Aligning coding sequences with frameshift extension penalties

BACKGROUND: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS,...

Descripción completa

Detalles Bibliográficos
Autores principales: Jammali, Safa, Kuitche, Esaie, Rachati, Ayoub, Bélanger, François, Scott, Michelle, Ouangraoua, Aïda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374649/
https://www.ncbi.nlm.nih.gov/pubmed/28373895
http://dx.doi.org/10.1186/s13015-017-0101-4
_version_ 1782518934071672832
author Jammali, Safa
Kuitche, Esaie
Rachati, Ayoub
Bélanger, François
Scott, Michelle
Ouangraoua, Aïda
author_facet Jammali, Safa
Kuitche, Esaie
Rachati, Ayoub
Bélanger, François
Scott, Michelle
Ouangraoua, Aïda
author_sort Jammali, Safa
collection PubMed
description BACKGROUND: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. The former does not allow to account for frameshift translations and up to now, the latter exclusively accounts for frameshift translation initiation, not considering the length of the translation disruption caused by a frameshift. RESULTS: We introduce a new scoring scheme with an algorithm for the pairwise alignment of CDS accounting for frameshift translation initiation and length, while simultaneously considering nucleotide and amino acid sequences. The main specificity of the scoring scheme is the introduction of a penalty cost accounting for frameshift extension length to compute an adequate similarity score for a CDS alignment. The second specificity of the model is that the search space of the problem solved is the set of all feasible alignments between two CDS. Previous approaches have considered restricted search space or additional constraints on the decomposition of an alignment into length-3 sub-alignments. The algorithm described in this paper has the same asymptotic time complexity as the classical Needleman–Wunsch algorithm. CONCLUSIONS: We compare the method to other CDS alignment methods based on an application to the comparison of pairs of CDS from homologous human, mouse and cow genes of ten mammalian gene families from the Ensembl-Compara database. The results show that our method is particularly robust to parameter changes as compared to existing methods. It also appears to be a good compromise, performing well both in the presence and absence of frameshift translations. An implementation of the method is available at https://github.com/UdeS-CoBIUS/FsePSA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-017-0101-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5374649
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53746492017-04-03 Aligning coding sequences with frameshift extension penalties Jammali, Safa Kuitche, Esaie Rachati, Ayoub Bélanger, François Scott, Michelle Ouangraoua, Aïda Algorithms Mol Biol Research BACKGROUND: Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. The former does not allow to account for frameshift translations and up to now, the latter exclusively accounts for frameshift translation initiation, not considering the length of the translation disruption caused by a frameshift. RESULTS: We introduce a new scoring scheme with an algorithm for the pairwise alignment of CDS accounting for frameshift translation initiation and length, while simultaneously considering nucleotide and amino acid sequences. The main specificity of the scoring scheme is the introduction of a penalty cost accounting for frameshift extension length to compute an adequate similarity score for a CDS alignment. The second specificity of the model is that the search space of the problem solved is the set of all feasible alignments between two CDS. Previous approaches have considered restricted search space or additional constraints on the decomposition of an alignment into length-3 sub-alignments. The algorithm described in this paper has the same asymptotic time complexity as the classical Needleman–Wunsch algorithm. CONCLUSIONS: We compare the method to other CDS alignment methods based on an application to the comparison of pairs of CDS from homologous human, mouse and cow genes of ten mammalian gene families from the Ensembl-Compara database. The results show that our method is particularly robust to parameter changes as compared to existing methods. It also appears to be a good compromise, performing well both in the presence and absence of frameshift translations. An implementation of the method is available at https://github.com/UdeS-CoBIUS/FsePSA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-017-0101-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-31 /pmc/articles/PMC5374649/ /pubmed/28373895 http://dx.doi.org/10.1186/s13015-017-0101-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Jammali, Safa
Kuitche, Esaie
Rachati, Ayoub
Bélanger, François
Scott, Michelle
Ouangraoua, Aïda
Aligning coding sequences with frameshift extension penalties
title Aligning coding sequences with frameshift extension penalties
title_full Aligning coding sequences with frameshift extension penalties
title_fullStr Aligning coding sequences with frameshift extension penalties
title_full_unstemmed Aligning coding sequences with frameshift extension penalties
title_short Aligning coding sequences with frameshift extension penalties
title_sort aligning coding sequences with frameshift extension penalties
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374649/
https://www.ncbi.nlm.nih.gov/pubmed/28373895
http://dx.doi.org/10.1186/s13015-017-0101-4
work_keys_str_mv AT jammalisafa aligningcodingsequenceswithframeshiftextensionpenalties
AT kuitcheesaie aligningcodingsequenceswithframeshiftextensionpenalties
AT rachatiayoub aligningcodingsequenceswithframeshiftextensionpenalties
AT belangerfrancois aligningcodingsequenceswithframeshiftextensionpenalties
AT scottmichelle aligningcodingsequenceswithframeshiftextensionpenalties
AT ouangraouaaida aligningcodingsequenceswithframeshiftextensionpenalties