Cargando…
SMAP is a pipeline for sample matching in proteogenomics
The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matchin...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825821/ https://www.ncbi.nlm.nih.gov/pubmed/35136070 http://dx.doi.org/10.1038/s41467-022-28411-8 |
_version_ | 1784647314592235520 |
---|---|
author | Li, Ling Niu, Mingming Erickson, Alyssa Luo, Jie Rowbotham, Kincaid Guo, Kai Huang, He Li, Yuxin Jiang, Yi Hur, Junguk Liu, Chunyu Peng, Junmin Wang, Xusheng |
author_facet | Li, Ling Niu, Mingming Erickson, Alyssa Luo, Jie Rowbotham, Kincaid Guo, Kai Huang, He Li, Yuxin Jiang, Yi Hur, Junguk Liu, Chunyu Peng, Junmin Wang, Xusheng |
author_sort | Li, Ling |
collection | PubMed |
description | The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/. |
format | Online Article Text |
id | pubmed-8825821 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-88258212022-02-18 SMAP is a pipeline for sample matching in proteogenomics Li, Ling Niu, Mingming Erickson, Alyssa Luo, Jie Rowbotham, Kincaid Guo, Kai Huang, He Li, Yuxin Jiang, Yi Hur, Junguk Liu, Chunyu Peng, Junmin Wang, Xusheng Nat Commun Article The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/. Nature Publishing Group UK 2022-02-08 /pmc/articles/PMC8825821/ /pubmed/35136070 http://dx.doi.org/10.1038/s41467-022-28411-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Li, Ling Niu, Mingming Erickson, Alyssa Luo, Jie Rowbotham, Kincaid Guo, Kai Huang, He Li, Yuxin Jiang, Yi Hur, Junguk Liu, Chunyu Peng, Junmin Wang, Xusheng SMAP is a pipeline for sample matching in proteogenomics |
title | SMAP is a pipeline for sample matching in proteogenomics |
title_full | SMAP is a pipeline for sample matching in proteogenomics |
title_fullStr | SMAP is a pipeline for sample matching in proteogenomics |
title_full_unstemmed | SMAP is a pipeline for sample matching in proteogenomics |
title_short | SMAP is a pipeline for sample matching in proteogenomics |
title_sort | smap is a pipeline for sample matching in proteogenomics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8825821/ https://www.ncbi.nlm.nih.gov/pubmed/35136070 http://dx.doi.org/10.1038/s41467-022-28411-8 |
work_keys_str_mv | AT liling smapisapipelineforsamplematchinginproteogenomics AT niumingming smapisapipelineforsamplematchinginproteogenomics AT ericksonalyssa smapisapipelineforsamplematchinginproteogenomics AT luojie smapisapipelineforsamplematchinginproteogenomics AT rowbothamkincaid smapisapipelineforsamplematchinginproteogenomics AT guokai smapisapipelineforsamplematchinginproteogenomics AT huanghe smapisapipelineforsamplematchinginproteogenomics AT liyuxin smapisapipelineforsamplematchinginproteogenomics AT jiangyi smapisapipelineforsamplematchinginproteogenomics AT hurjunguk smapisapipelineforsamplematchinginproteogenomics AT liuchunyu smapisapipelineforsamplematchinginproteogenomics AT pengjunmin smapisapipelineforsamplematchinginproteogenomics AT wangxusheng smapisapipelineforsamplematchinginproteogenomics |