Cargando…

New tools to analyze overlapping coding regions

BACKGROUND: Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It foll...

Descripción completa

Detalles Bibliográficos
Autores principales: Bayegan, Amir H., Garcia-Martin, Juan Antonio, Clote, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5155393/
https://www.ncbi.nlm.nih.gov/pubmed/27964762
http://dx.doi.org/10.1186/s12859-016-1389-7
_version_ 1782474996300382208
author Bayegan, Amir H.
Garcia-Martin, Juan Antonio
Clote, Peter
author_facet Bayegan, Amir H.
Garcia-Martin, Juan Antonio
Clote, Peter
author_sort Bayegan, Amir H.
collection PubMed
description BACKGROUND: Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It follows that the molecular evolution of this genomic region of HIV-1 is highly constrained, since the retroviral genome must contain a slippery sequence (sequence constraint), code appropriate peptides in reading frames 0 and 1 (coding requirements), and form a thermodynamically stable stem-loop secondary structure (structure requirement). RESULTS: We describe a unique computational tool, RNAsampleCDS, designed to compute the number of RNA sequences that code two (or more) peptides p,q in overlapping reading frames, that are identical (or have BLOSUM/PAM similarity that exceeds a user-specified value) to the input peptides p,q. RNAsampleCDS then samples a user-specified number of messenger RNAs that code such peptides; alternatively, RNAsampleCDS can exactly compute the position-specific scoring matrix and codon usage bias for all such RNA sequences. Our software allows the user to stipulate overlapping coding requirements for all 6 possible reading frames simultaneously, even allowing IUPAC constraints on RNA sequences and fixing GC-content. We generalize the notion of codon preference index (CPI) to overlapping reading frames, and use RNAsampleCDS to generate control sequences required in the computation of CPI. Moreover, by applying RNAsampleCDS, we are able to quantify the extent to which the overlapping coding requirement in HIV-1 [resp. HCV] contribute to the formation of the stem-loop [resp. double stem-loop] secondary structure known as the frameshift stimulating signal. Using our software, we confirm that certain experimentally determined deleterious HCV mutations occur in positions for which our software RNAsampleCDS and RNAiFold both indicate a single possible nucleotide. We generalize the notion of codon preference index (CPI) to overlapping coding regions, and use RNAsampleCDS to generate control sequences required in the computation of CPI for the Gag-Pol overlapping coding region of HIV-1. These applications show that RNAsampleCDS constitutes a unique tool in the software arsenal now available to evolutionary biologists. CONCLUSION: Source code for the programs and additional data are available at http://bioinformatics.bc.edu/clotelab/RNAsampleCDS/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1389-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5155393
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51553932016-12-20 New tools to analyze overlapping coding regions Bayegan, Amir H. Garcia-Martin, Juan Antonio Clote, Peter BMC Bioinformatics Research Article BACKGROUND: Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It follows that the molecular evolution of this genomic region of HIV-1 is highly constrained, since the retroviral genome must contain a slippery sequence (sequence constraint), code appropriate peptides in reading frames 0 and 1 (coding requirements), and form a thermodynamically stable stem-loop secondary structure (structure requirement). RESULTS: We describe a unique computational tool, RNAsampleCDS, designed to compute the number of RNA sequences that code two (or more) peptides p,q in overlapping reading frames, that are identical (or have BLOSUM/PAM similarity that exceeds a user-specified value) to the input peptides p,q. RNAsampleCDS then samples a user-specified number of messenger RNAs that code such peptides; alternatively, RNAsampleCDS can exactly compute the position-specific scoring matrix and codon usage bias for all such RNA sequences. Our software allows the user to stipulate overlapping coding requirements for all 6 possible reading frames simultaneously, even allowing IUPAC constraints on RNA sequences and fixing GC-content. We generalize the notion of codon preference index (CPI) to overlapping reading frames, and use RNAsampleCDS to generate control sequences required in the computation of CPI. Moreover, by applying RNAsampleCDS, we are able to quantify the extent to which the overlapping coding requirement in HIV-1 [resp. HCV] contribute to the formation of the stem-loop [resp. double stem-loop] secondary structure known as the frameshift stimulating signal. Using our software, we confirm that certain experimentally determined deleterious HCV mutations occur in positions for which our software RNAsampleCDS and RNAiFold both indicate a single possible nucleotide. We generalize the notion of codon preference index (CPI) to overlapping coding regions, and use RNAsampleCDS to generate control sequences required in the computation of CPI for the Gag-Pol overlapping coding region of HIV-1. These applications show that RNAsampleCDS constitutes a unique tool in the software arsenal now available to evolutionary biologists. CONCLUSION: Source code for the programs and additional data are available at http://bioinformatics.bc.edu/clotelab/RNAsampleCDS/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1389-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-13 /pmc/articles/PMC5155393/ /pubmed/27964762 http://dx.doi.org/10.1186/s12859-016-1389-7 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bayegan, Amir H.
Garcia-Martin, Juan Antonio
Clote, Peter
New tools to analyze overlapping coding regions
title New tools to analyze overlapping coding regions
title_full New tools to analyze overlapping coding regions
title_fullStr New tools to analyze overlapping coding regions
title_full_unstemmed New tools to analyze overlapping coding regions
title_short New tools to analyze overlapping coding regions
title_sort new tools to analyze overlapping coding regions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5155393/
https://www.ncbi.nlm.nih.gov/pubmed/27964762
http://dx.doi.org/10.1186/s12859-016-1389-7
work_keys_str_mv AT bayeganamirh newtoolstoanalyzeoverlappingcodingregions
AT garciamartinjuanantonio newtoolstoanalyzeoverlappingcodingregions
AT clotepeter newtoolstoanalyzeoverlappingcodingregions