Cargando…

MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premat...

Descripción completa

Detalles Bibliográficos
Autores principales: Ranwez, Vincent, Harispe, Sébastien, Delsuc, Frédéric, Douzery, Emmanuel J. P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3174933/
https://www.ncbi.nlm.nih.gov/pubmed/21949676
http://dx.doi.org/10.1371/journal.pone.0022594
_version_ 1782212088745164800
author Ranwez, Vincent
Harispe, Sébastien
Delsuc, Frédéric
Douzery, Emmanuel J. P.
author_facet Ranwez, Vincent
Harispe, Sébastien
Delsuc, Frédéric
Douzery, Emmanuel J. P.
author_sort Ranwez, Vincent
collection PubMed
description Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.
format Online
Article
Text
id pubmed-3174933
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31749332011-09-26 MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons Ranwez, Vincent Harispe, Sébastien Delsuc, Frédéric Douzery, Emmanuel J. P. PLoS One Research Article Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse. Public Library of Science 2011-09-16 /pmc/articles/PMC3174933/ /pubmed/21949676 http://dx.doi.org/10.1371/journal.pone.0022594 Text en Ranwez et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ranwez, Vincent
Harispe, Sébastien
Delsuc, Frédéric
Douzery, Emmanuel J. P.
MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons
title MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons
title_full MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons
title_fullStr MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons
title_full_unstemmed MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons
title_short MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons
title_sort macse: multiple alignment of coding sequences accounting for frameshifts and stop codons
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3174933/
https://www.ncbi.nlm.nih.gov/pubmed/21949676
http://dx.doi.org/10.1371/journal.pone.0022594
work_keys_str_mv AT ranwezvincent macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
AT harispesebastien macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
AT delsucfrederic macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
AT douzeryemmanueljp macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons