Cargando…
A community effort to identify and correct mislabeled samples in proteogenomic studies
Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct...
Autores principales: | , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8134945/ https://www.ncbi.nlm.nih.gov/pubmed/34036290 http://dx.doi.org/10.1016/j.patter.2021.100245 |
_version_ | 1783695272897937408 |
---|---|
author | Yoo, Seungyeul Shi, Zhiao Wen, Bo Kho, SoonJye Pan, Renke Feng, Hanying Chen, Hong Carlsson, Anders Edén, Patrik Ma, Weiping Raymer, Michael Maier, Ezekiel J. Tezak, Zivana Johanson, Elaine Hinton, Denise Rodriguez, Henry Zhu, Jun Boja, Emily Wang, Pei Zhang, Bing |
author_facet | Yoo, Seungyeul Shi, Zhiao Wen, Bo Kho, SoonJye Pan, Renke Feng, Hanying Chen, Hong Carlsson, Anders Edén, Patrik Ma, Weiping Raymer, Michael Maier, Ezekiel J. Tezak, Zivana Johanson, Elaine Hinton, Denise Rodriguez, Henry Zhu, Jun Boja, Emily Wang, Pei Zhang, Bing |
author_sort | Yoo, Seungyeul |
collection | PubMed |
description | Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets. |
format | Online Article Text |
id | pubmed-8134945 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-81349452021-05-24 A community effort to identify and correct mislabeled samples in proteogenomic studies Yoo, Seungyeul Shi, Zhiao Wen, Bo Kho, SoonJye Pan, Renke Feng, Hanying Chen, Hong Carlsson, Anders Edén, Patrik Ma, Weiping Raymer, Michael Maier, Ezekiel J. Tezak, Zivana Johanson, Elaine Hinton, Denise Rodriguez, Henry Zhu, Jun Boja, Emily Wang, Pei Zhang, Bing Patterns (N Y) Article Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets. Elsevier 2021-05-07 /pmc/articles/PMC8134945/ /pubmed/34036290 http://dx.doi.org/10.1016/j.patter.2021.100245 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Yoo, Seungyeul Shi, Zhiao Wen, Bo Kho, SoonJye Pan, Renke Feng, Hanying Chen, Hong Carlsson, Anders Edén, Patrik Ma, Weiping Raymer, Michael Maier, Ezekiel J. Tezak, Zivana Johanson, Elaine Hinton, Denise Rodriguez, Henry Zhu, Jun Boja, Emily Wang, Pei Zhang, Bing A community effort to identify and correct mislabeled samples in proteogenomic studies |
title | A community effort to identify and correct mislabeled samples in proteogenomic studies |
title_full | A community effort to identify and correct mislabeled samples in proteogenomic studies |
title_fullStr | A community effort to identify and correct mislabeled samples in proteogenomic studies |
title_full_unstemmed | A community effort to identify and correct mislabeled samples in proteogenomic studies |
title_short | A community effort to identify and correct mislabeled samples in proteogenomic studies |
title_sort | community effort to identify and correct mislabeled samples in proteogenomic studies |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8134945/ https://www.ncbi.nlm.nih.gov/pubmed/34036290 http://dx.doi.org/10.1016/j.patter.2021.100245 |
work_keys_str_mv | AT yooseungyeul acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT shizhiao acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT wenbo acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT khosoonjye acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT panrenke acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT fenghanying acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT chenhong acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT carlssonanders acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT edenpatrik acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT maweiping acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT raymermichael acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT maierezekielj acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT tezakzivana acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT johansonelaine acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT hintondenise acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT rodriguezhenry acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT zhujun acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT bojaemily acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT wangpei acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT zhangbing acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT yooseungyeul communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT shizhiao communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT wenbo communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT khosoonjye communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT panrenke communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT fenghanying communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT chenhong communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT carlssonanders communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT edenpatrik communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT maweiping communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT raymermichael communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT maierezekielj communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT tezakzivana communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT johansonelaine communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT hintondenise communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT rodriguezhenry communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT zhujun communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT bojaemily communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT wangpei communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies AT zhangbing communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies |