Cargando…

High throughput discovery of protein variants using proteomics informed by transcriptomics

Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of...

Descripción completa

Detalles Bibliográficos
Autores principales: Saha, Shyamasree, Matthews, David A, Bessant, Conrad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007231/
https://www.ncbi.nlm.nih.gov/pubmed/29718325
http://dx.doi.org/10.1093/nar/gky295
_version_ 1783332997236260864
author Saha, Shyamasree
Matthews, David A
Bessant, Conrad
author_facet Saha, Shyamasree
Matthews, David A
Bessant, Conrad
author_sort Saha, Shyamasree
collection PubMed
description Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32 392 polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine.
format Online
Article
Text
id pubmed-6007231
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60072312018-06-25 High throughput discovery of protein variants using proteomics informed by transcriptomics Saha, Shyamasree Matthews, David A Bessant, Conrad Nucleic Acids Res Computational Biology Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32 392 polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine. Oxford University Press 2018-06-01 2018-04-30 /pmc/articles/PMC6007231/ /pubmed/29718325 http://dx.doi.org/10.1093/nar/gky295 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Saha, Shyamasree
Matthews, David A
Bessant, Conrad
High throughput discovery of protein variants using proteomics informed by transcriptomics
title High throughput discovery of protein variants using proteomics informed by transcriptomics
title_full High throughput discovery of protein variants using proteomics informed by transcriptomics
title_fullStr High throughput discovery of protein variants using proteomics informed by transcriptomics
title_full_unstemmed High throughput discovery of protein variants using proteomics informed by transcriptomics
title_short High throughput discovery of protein variants using proteomics informed by transcriptomics
title_sort high throughput discovery of protein variants using proteomics informed by transcriptomics
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007231/
https://www.ncbi.nlm.nih.gov/pubmed/29718325
http://dx.doi.org/10.1093/nar/gky295
work_keys_str_mv AT sahashyamasree highthroughputdiscoveryofproteinvariantsusingproteomicsinformedbytranscriptomics
AT matthewsdavida highthroughputdiscoveryofproteinvariantsusingproteomicsinformedbytranscriptomics
AT bessantconrad highthroughputdiscoveryofproteinvariantsusingproteomicsinformedbytranscriptomics