Cargando…

From pairwise to multiple spliced alignment

MOTIVATION: Alternative splicing is a ubiquitous process in eukaryotes that allows distinct transcripts to be produced from the same gene. Yet, the study of transcript evolution within a gene family is still in its infancy. One prerequisite for this study is the availability of methods to compare se...

Descripción completa

Detalles Bibliográficos
Autores principales: Jammali, Safa, Djossou, Abigaïl, Ouédraogo, Wend-Yam D D, Nevers, Yannis, Chegrane, Ibrahim, Ouangraoua, Aïda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710695/
https://www.ncbi.nlm.nih.gov/pubmed/36699392
http://dx.doi.org/10.1093/bioadv/vbab044
_version_ 1784841420603916288
author Jammali, Safa
Djossou, Abigaïl
Ouédraogo, Wend-Yam D D
Nevers, Yannis
Chegrane, Ibrahim
Ouangraoua, Aïda
author_facet Jammali, Safa
Djossou, Abigaïl
Ouédraogo, Wend-Yam D D
Nevers, Yannis
Chegrane, Ibrahim
Ouangraoua, Aïda
author_sort Jammali, Safa
collection PubMed
description MOTIVATION: Alternative splicing is a ubiquitous process in eukaryotes that allows distinct transcripts to be produced from the same gene. Yet, the study of transcript evolution within a gene family is still in its infancy. One prerequisite for this study is the availability of methods to compare sets of transcripts while accounting for their splicing structure. In this context, we generalize the concept of pairwise spliced alignments (PSpAs) to multiple spliced alignments (MSpAs). MSpAs have several important purposes in addition to empowering the study of the evolution of transcripts. For instance, it is a key to improving the prediction of gene models, which is important to solve the growing problem of genome annotation. Despite its essentialness, a formal definition of the concept and methods to compute MSpAs are still lacking. RESULTS: We introduce the MSpA problem and the SplicedFamAlignMulti (SFAM) method, to compute the MSpA of a gene family. Like most multiple sequence alignment (MSA) methods that are generally greedy heuristic methods assembling pairwise alignments, SFAM combines all PSpAs of coding DNA sequences and gene sequences of a gene family into an MSpA. It produces a single structure that represents the superstructure and models of the gene family. Using real vertebrate and simulated gene family data, we illustrate the utility of SFAM for computing accurate gene family superstructures, MSAs, inferring splicing orthologous groups and improving gene-model annotations. AVAILABILITY AND IMPLEMENTATION: The supporting data and implementation of SFAM are freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlignMulti. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9710695
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97106952023-01-24 From pairwise to multiple spliced alignment Jammali, Safa Djossou, Abigaïl Ouédraogo, Wend-Yam D D Nevers, Yannis Chegrane, Ibrahim Ouangraoua, Aïda Bioinform Adv Original Article MOTIVATION: Alternative splicing is a ubiquitous process in eukaryotes that allows distinct transcripts to be produced from the same gene. Yet, the study of transcript evolution within a gene family is still in its infancy. One prerequisite for this study is the availability of methods to compare sets of transcripts while accounting for their splicing structure. In this context, we generalize the concept of pairwise spliced alignments (PSpAs) to multiple spliced alignments (MSpAs). MSpAs have several important purposes in addition to empowering the study of the evolution of transcripts. For instance, it is a key to improving the prediction of gene models, which is important to solve the growing problem of genome annotation. Despite its essentialness, a formal definition of the concept and methods to compute MSpAs are still lacking. RESULTS: We introduce the MSpA problem and the SplicedFamAlignMulti (SFAM) method, to compute the MSpA of a gene family. Like most multiple sequence alignment (MSA) methods that are generally greedy heuristic methods assembling pairwise alignments, SFAM combines all PSpAs of coding DNA sequences and gene sequences of a gene family into an MSpA. It produces a single structure that represents the superstructure and models of the gene family. Using real vertebrate and simulated gene family data, we illustrate the utility of SFAM for computing accurate gene family superstructures, MSAs, inferring splicing orthologous groups and improving gene-model annotations. AVAILABILITY AND IMPLEMENTATION: The supporting data and implementation of SFAM are freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlignMulti. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-01-05 /pmc/articles/PMC9710695/ /pubmed/36699392 http://dx.doi.org/10.1093/bioadv/vbab044 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Jammali, Safa
Djossou, Abigaïl
Ouédraogo, Wend-Yam D D
Nevers, Yannis
Chegrane, Ibrahim
Ouangraoua, Aïda
From pairwise to multiple spliced alignment
title From pairwise to multiple spliced alignment
title_full From pairwise to multiple spliced alignment
title_fullStr From pairwise to multiple spliced alignment
title_full_unstemmed From pairwise to multiple spliced alignment
title_short From pairwise to multiple spliced alignment
title_sort from pairwise to multiple spliced alignment
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710695/
https://www.ncbi.nlm.nih.gov/pubmed/36699392
http://dx.doi.org/10.1093/bioadv/vbab044
work_keys_str_mv AT jammalisafa frompairwisetomultiplesplicedalignment
AT djossouabigail frompairwisetomultiplesplicedalignment
AT ouedraogowendyamdd frompairwisetomultiplesplicedalignment
AT neversyannis frompairwisetomultiplesplicedalignment
AT chegraneibrahim frompairwisetomultiplesplicedalignment
AT ouangraouaaida frompairwisetomultiplesplicedalignment