Cargando…

SMAP is a pipeline for sample matching in proteogenomics

The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matchin...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ling, Niu, Mingming, Erickson, Alyssa, Luo, Jie, Rowbotham, Kincaid, Guo, Kai, Huang, He, Li, Yuxin, Jiang, Yi, Hur, Junguk, Liu, Chunyu, Peng, Junmin, Wang, Xusheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825821/
https://www.ncbi.nlm.nih.gov/pubmed/35136070
http://dx.doi.org/10.1038/s41467-022-28411-8
Descripción
Sumario:The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/.