Cargando…

SMAP is a pipeline for sample matching in proteogenomics

The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matchin...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ling, Niu, Mingming, Erickson, Alyssa, Luo, Jie, Rowbotham, Kincaid, Guo, Kai, Huang, He, Li, Yuxin, Jiang, Yi, Hur, Junguk, Liu, Chunyu, Peng, Junmin, Wang, Xusheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825821/
https://www.ncbi.nlm.nih.gov/pubmed/35136070
http://dx.doi.org/10.1038/s41467-022-28411-8
_version_ 1784647314592235520
author Li, Ling
Niu, Mingming
Erickson, Alyssa
Luo, Jie
Rowbotham, Kincaid
Guo, Kai
Huang, He
Li, Yuxin
Jiang, Yi
Hur, Junguk
Liu, Chunyu
Peng, Junmin
Wang, Xusheng
author_facet Li, Ling
Niu, Mingming
Erickson, Alyssa
Luo, Jie
Rowbotham, Kincaid
Guo, Kai
Huang, He
Li, Yuxin
Jiang, Yi
Hur, Junguk
Liu, Chunyu
Peng, Junmin
Wang, Xusheng
author_sort Li, Ling
collection PubMed
description The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/.
format Online
Article
Text
id pubmed-8825821
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-88258212022-02-18 SMAP is a pipeline for sample matching in proteogenomics Li, Ling Niu, Mingming Erickson, Alyssa Luo, Jie Rowbotham, Kincaid Guo, Kai Huang, He Li, Yuxin Jiang, Yi Hur, Junguk Liu, Chunyu Peng, Junmin Wang, Xusheng Nat Commun Article The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/. Nature Publishing Group UK 2022-02-08 /pmc/articles/PMC8825821/ /pubmed/35136070 http://dx.doi.org/10.1038/s41467-022-28411-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Li, Ling
Niu, Mingming
Erickson, Alyssa
Luo, Jie
Rowbotham, Kincaid
Guo, Kai
Huang, He
Li, Yuxin
Jiang, Yi
Hur, Junguk
Liu, Chunyu
Peng, Junmin
Wang, Xusheng
SMAP is a pipeline for sample matching in proteogenomics
title SMAP is a pipeline for sample matching in proteogenomics
title_full SMAP is a pipeline for sample matching in proteogenomics
title_fullStr SMAP is a pipeline for sample matching in proteogenomics
title_full_unstemmed SMAP is a pipeline for sample matching in proteogenomics
title_short SMAP is a pipeline for sample matching in proteogenomics
title_sort smap is a pipeline for sample matching in proteogenomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825821/
https://www.ncbi.nlm.nih.gov/pubmed/35136070
http://dx.doi.org/10.1038/s41467-022-28411-8
work_keys_str_mv AT liling smapisapipelineforsamplematchinginproteogenomics
AT niumingming smapisapipelineforsamplematchinginproteogenomics
AT ericksonalyssa smapisapipelineforsamplematchinginproteogenomics
AT luojie smapisapipelineforsamplematchinginproteogenomics
AT rowbothamkincaid smapisapipelineforsamplematchinginproteogenomics
AT guokai smapisapipelineforsamplematchinginproteogenomics
AT huanghe smapisapipelineforsamplematchinginproteogenomics
AT liyuxin smapisapipelineforsamplematchinginproteogenomics
AT jiangyi smapisapipelineforsamplematchinginproteogenomics
AT hurjunguk smapisapipelineforsamplematchinginproteogenomics
AT liuchunyu smapisapipelineforsamplematchinginproteogenomics
AT pengjunmin smapisapipelineforsamplematchinginproteogenomics
AT wangxusheng smapisapipelineforsamplematchinginproteogenomics