Cargando…

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

BACKGROUND: The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jammali, Safa, Aguilar, Jean-David, Kuitche, Esaie, Ouangraoua, Aïda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6439985/
https://www.ncbi.nlm.nih.gov/pubmed/30925859
http://dx.doi.org/10.1186/s12859-019-2647-2
_version_ 1783407304702427136
author Jammali, Safa
Aguilar, Jean-David
Kuitche, Esaie
Ouangraoua, Aïda
author_facet Jammali, Safa
Aguilar, Jean-David
Kuitche, Esaie
Ouangraoua, Aïda
author_sort Jammali, Safa
collection PubMed
description BACKGROUND: The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. RESULTS: The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. CONCLUSION: We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2647-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6439985
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64399852019-04-11 SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups Jammali, Safa Aguilar, Jean-David Kuitche, Esaie Ouangraoua, Aïda BMC Bioinformatics Research BACKGROUND: The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. RESULTS: The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. CONCLUSION: We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2647-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-29 /pmc/articles/PMC6439985/ /pubmed/30925859 http://dx.doi.org/10.1186/s12859-019-2647-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Jammali, Safa
Aguilar, Jean-David
Kuitche, Esaie
Ouangraoua, Aïda
SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_full SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_fullStr SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_full_unstemmed SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_short SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_sort splicedfamalign: cds-to-gene spliced alignment and identification of transcript orthology groups
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6439985/
https://www.ncbi.nlm.nih.gov/pubmed/30925859
http://dx.doi.org/10.1186/s12859-019-2647-2
work_keys_str_mv AT jammalisafa splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
AT aguilarjeandavid splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
AT kuitcheesaie splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
AT ouangraouaaida splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups