Cargando…
SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
BACKGROUND: The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6439985/ https://www.ncbi.nlm.nih.gov/pubmed/30925859 http://dx.doi.org/10.1186/s12859-019-2647-2 |
_version_ | 1783407304702427136 |
---|---|
author | Jammali, Safa Aguilar, Jean-David Kuitche, Esaie Ouangraoua, Aïda |
author_facet | Jammali, Safa Aguilar, Jean-David Kuitche, Esaie Ouangraoua, Aïda |
author_sort | Jammali, Safa |
collection | PubMed |
description | BACKGROUND: The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. RESULTS: The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. CONCLUSION: We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2647-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6439985 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64399852019-04-11 SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups Jammali, Safa Aguilar, Jean-David Kuitche, Esaie Ouangraoua, Aïda BMC Bioinformatics Research BACKGROUND: The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. RESULTS: The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. CONCLUSION: We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2647-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-29 /pmc/articles/PMC6439985/ /pubmed/30925859 http://dx.doi.org/10.1186/s12859-019-2647-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Jammali, Safa Aguilar, Jean-David Kuitche, Esaie Ouangraoua, Aïda SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups |
title | SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups |
title_full | SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups |
title_fullStr | SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups |
title_full_unstemmed | SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups |
title_short | SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups |
title_sort | splicedfamalign: cds-to-gene spliced alignment and identification of transcript orthology groups |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6439985/ https://www.ncbi.nlm.nih.gov/pubmed/30925859 http://dx.doi.org/10.1186/s12859-019-2647-2 |
work_keys_str_mv | AT jammalisafa splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups AT aguilarjeandavid splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups AT kuitcheesaie splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups AT ouangraouaaida splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups |