Cargando…

OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases

[Image: see text] Proteomic diversity in biological samples can be characterized by mass spectrometry (MS)-based proteomics using customized protein databases generated from sets of transcripts previously detected by RNA-seq. This diversity has only been increased by the recent discovery that many t...

Descripción completa

Detalles Bibliográficos
Autores principales: Guilloy, Noé, Brunet, Marie A., Leblanc, Sébastien, Jacques, Jean-François, Hardy, Marie-Pierre, Ehx, Grégory, Lanoix, Joël, Thibault, Pierre, Perreault, Claude, Roucou, Xavier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10167680/
https://www.ncbi.nlm.nih.gov/pubmed/36961377
http://dx.doi.org/10.1021/acs.jproteome.3c00054
_version_ 1785038723368353792
author Guilloy, Noé
Brunet, Marie A.
Leblanc, Sébastien
Jacques, Jean-François
Hardy, Marie-Pierre
Ehx, Grégory
Lanoix, Joël
Thibault, Pierre
Perreault, Claude
Roucou, Xavier
author_facet Guilloy, Noé
Brunet, Marie A.
Leblanc, Sébastien
Jacques, Jean-François
Hardy, Marie-Pierre
Ehx, Grégory
Lanoix, Joël
Thibault, Pierre
Perreault, Claude
Roucou, Xavier
author_sort Guilloy, Noé
collection PubMed
description [Image: see text] Proteomic diversity in biological samples can be characterized by mass spectrometry (MS)-based proteomics using customized protein databases generated from sets of transcripts previously detected by RNA-seq. This diversity has only been increased by the recent discovery that many translated alternative open reading frames rest unannotated at unsuspected locations of mRNAs and ncRNAs. These novel protein products, termed alternative proteins, have been left out of all previous custom database generation tools. Consequently, genetic variations that impact alternative open reading frames and variant peptides from their translated proteins are not detectable with current computational workflows. To fill this gap, we present OpenCustomDB, a bioinformatics tool that uses sample-specific RNaseq data to identify genomic variants in canonical and alternative open reading frames, allowing for more than one coding region per transcript. In a test reanalysis of a cohort of 16 patients with acute myeloid leukemia, 5666 peptides from alternative proteins were detected, including 201 variant peptides. We also observed that a significant fraction of peptide-spectrum matches previously assigned to peptides from canonical proteins got better scores when reassigned to peptides from alternative proteins. Custom protein libraries that include sample-specific sequence variations of all possible open reading frames are promising contributions to the development of proteomics and precision medicine. The raw and processed proteomics data presented in this study can be found in PRIDE repository with accession number PXD029240.
format Online
Article
Text
id pubmed-10167680
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-101676802023-05-10 OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases Guilloy, Noé Brunet, Marie A. Leblanc, Sébastien Jacques, Jean-François Hardy, Marie-Pierre Ehx, Grégory Lanoix, Joël Thibault, Pierre Perreault, Claude Roucou, Xavier J Proteome Res [Image: see text] Proteomic diversity in biological samples can be characterized by mass spectrometry (MS)-based proteomics using customized protein databases generated from sets of transcripts previously detected by RNA-seq. This diversity has only been increased by the recent discovery that many translated alternative open reading frames rest unannotated at unsuspected locations of mRNAs and ncRNAs. These novel protein products, termed alternative proteins, have been left out of all previous custom database generation tools. Consequently, genetic variations that impact alternative open reading frames and variant peptides from their translated proteins are not detectable with current computational workflows. To fill this gap, we present OpenCustomDB, a bioinformatics tool that uses sample-specific RNaseq data to identify genomic variants in canonical and alternative open reading frames, allowing for more than one coding region per transcript. In a test reanalysis of a cohort of 16 patients with acute myeloid leukemia, 5666 peptides from alternative proteins were detected, including 201 variant peptides. We also observed that a significant fraction of peptide-spectrum matches previously assigned to peptides from canonical proteins got better scores when reassigned to peptides from alternative proteins. Custom protein libraries that include sample-specific sequence variations of all possible open reading frames are promising contributions to the development of proteomics and precision medicine. The raw and processed proteomics data presented in this study can be found in PRIDE repository with accession number PXD029240. American Chemical Society 2023-03-24 /pmc/articles/PMC10167680/ /pubmed/36961377 http://dx.doi.org/10.1021/acs.jproteome.3c00054 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Guilloy, Noé
Brunet, Marie A.
Leblanc, Sébastien
Jacques, Jean-François
Hardy, Marie-Pierre
Ehx, Grégory
Lanoix, Joël
Thibault, Pierre
Perreault, Claude
Roucou, Xavier
OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases
title OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases
title_full OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases
title_fullStr OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases
title_full_unstemmed OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases
title_short OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases
title_sort opencustomdb: integration of unannotated open reading frames and genetic variants to generate more comprehensive customized protein databases
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10167680/
https://www.ncbi.nlm.nih.gov/pubmed/36961377
http://dx.doi.org/10.1021/acs.jproteome.3c00054
work_keys_str_mv AT guilloynoe opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT brunetmariea opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT leblancsebastien opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT jacquesjeanfrancois opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT hardymariepierre opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT ehxgregory opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT lanoixjoel opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT thibaultpierre opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT perreaultclaude opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases
AT roucouxavier opencustomdbintegrationofunannotatedopenreadingframesandgeneticvariantstogeneratemorecomprehensivecustomizedproteindatabases