Cargando…
High throughput discovery of protein variants using proteomics informed by transcriptomics
Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007231/ https://www.ncbi.nlm.nih.gov/pubmed/29718325 http://dx.doi.org/10.1093/nar/gky295 |
_version_ | 1783332997236260864 |
---|---|
author | Saha, Shyamasree Matthews, David A Bessant, Conrad |
author_facet | Saha, Shyamasree Matthews, David A Bessant, Conrad |
author_sort | Saha, Shyamasree |
collection | PubMed |
description | Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32 392 polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine. |
format | Online Article Text |
id | pubmed-6007231 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60072312018-06-25 High throughput discovery of protein variants using proteomics informed by transcriptomics Saha, Shyamasree Matthews, David A Bessant, Conrad Nucleic Acids Res Computational Biology Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32 392 polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine. Oxford University Press 2018-06-01 2018-04-30 /pmc/articles/PMC6007231/ /pubmed/29718325 http://dx.doi.org/10.1093/nar/gky295 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Saha, Shyamasree Matthews, David A Bessant, Conrad High throughput discovery of protein variants using proteomics informed by transcriptomics |
title | High throughput discovery of protein variants using proteomics informed by transcriptomics |
title_full | High throughput discovery of protein variants using proteomics informed by transcriptomics |
title_fullStr | High throughput discovery of protein variants using proteomics informed by transcriptomics |
title_full_unstemmed | High throughput discovery of protein variants using proteomics informed by transcriptomics |
title_short | High throughput discovery of protein variants using proteomics informed by transcriptomics |
title_sort | high throughput discovery of protein variants using proteomics informed by transcriptomics |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007231/ https://www.ncbi.nlm.nih.gov/pubmed/29718325 http://dx.doi.org/10.1093/nar/gky295 |
work_keys_str_mv | AT sahashyamasree highthroughputdiscoveryofproteinvariantsusingproteomicsinformedbytranscriptomics AT matthewsdavida highthroughputdiscoveryofproteinvariantsusingproteomicsinformedbytranscriptomics AT bessantconrad highthroughputdiscoveryofproteinvariantsusingproteomicsinformedbytranscriptomics |