Cargando…

Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs

BACKGROUND: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e.g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by c...

Descripción completa

Detalles Bibliográficos
Autores principales: Crappé, Jeroen, Van Criekinge, Wim, Trooskens, Geert, Hayakawa, Eisuke, Luyten, Walter, Baggerman, Geert, Menschaert, Gerben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852105/
https://www.ncbi.nlm.nih.gov/pubmed/24059539
http://dx.doi.org/10.1186/1471-2164-14-648
_version_ 1782478610114805760
author Crappé, Jeroen
Van Criekinge, Wim
Trooskens, Geert
Hayakawa, Eisuke
Luyten, Walter
Baggerman, Geert
Menschaert, Gerben
author_facet Crappé, Jeroen
Van Criekinge, Wim
Trooskens, Geert
Hayakawa, Eisuke
Luyten, Walter
Baggerman, Geert
Menschaert, Gerben
author_sort Crappé, Jeroen
collection PubMed
description BACKGROUND: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e.g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. RESULTS: Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data, hinting to sORF translation. All candidates are visually inspected using an in-house developed genome browser. In this way dozens of highly conserved sORFs, targeted by ribosomes were identified in the mouse genome, putatively encoding micropeptides. CONCLUSION: Our combined genome-wide approach leads to the prediction of a comprehensive but manageable set of putatively coding sORFs, a very important first step towards the identification of a new class of bioactive peptides, called micropeptides.
format Online
Article
Text
id pubmed-3852105
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38521052013-12-06 Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs Crappé, Jeroen Van Criekinge, Wim Trooskens, Geert Hayakawa, Eisuke Luyten, Walter Baggerman, Geert Menschaert, Gerben BMC Genomics Research Article BACKGROUND: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e.g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. RESULTS: Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data, hinting to sORF translation. All candidates are visually inspected using an in-house developed genome browser. In this way dozens of highly conserved sORFs, targeted by ribosomes were identified in the mouse genome, putatively encoding micropeptides. CONCLUSION: Our combined genome-wide approach leads to the prediction of a comprehensive but manageable set of putatively coding sORFs, a very important first step towards the identification of a new class of bioactive peptides, called micropeptides. BioMed Central 2013-09-23 /pmc/articles/PMC3852105/ /pubmed/24059539 http://dx.doi.org/10.1186/1471-2164-14-648 Text en Copyright © 2013 Crappé et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Crappé, Jeroen
Van Criekinge, Wim
Trooskens, Geert
Hayakawa, Eisuke
Luyten, Walter
Baggerman, Geert
Menschaert, Gerben
Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
title Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
title_full Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
title_fullStr Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
title_full_unstemmed Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
title_short Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
title_sort combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sorfs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852105/
https://www.ncbi.nlm.nih.gov/pubmed/24059539
http://dx.doi.org/10.1186/1471-2164-14-648
work_keys_str_mv AT crappejeroen combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs
AT vancriekingewim combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs
AT trooskensgeert combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs
AT hayakawaeisuke combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs
AT luytenwalter combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs
AT baggermangeert combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs
AT menschaertgerben combininginsilicopredictionandribosomeprofilinginagenomewidesearchfornovelputativelycodingsorfs