Cargando…

A community effort to identify and correct mislabeled samples in proteogenomic studies

Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoo, Seungyeul, Shi, Zhiao, Wen, Bo, Kho, SoonJye, Pan, Renke, Feng, Hanying, Chen, Hong, Carlsson, Anders, Edén, Patrik, Ma, Weiping, Raymer, Michael, Maier, Ezekiel J., Tezak, Zivana, Johanson, Elaine, Hinton, Denise, Rodriguez, Henry, Zhu, Jun, Boja, Emily, Wang, Pei, Zhang, Bing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8134945/
https://www.ncbi.nlm.nih.gov/pubmed/34036290
http://dx.doi.org/10.1016/j.patter.2021.100245
_version_ 1783695272897937408
author Yoo, Seungyeul
Shi, Zhiao
Wen, Bo
Kho, SoonJye
Pan, Renke
Feng, Hanying
Chen, Hong
Carlsson, Anders
Edén, Patrik
Ma, Weiping
Raymer, Michael
Maier, Ezekiel J.
Tezak, Zivana
Johanson, Elaine
Hinton, Denise
Rodriguez, Henry
Zhu, Jun
Boja, Emily
Wang, Pei
Zhang, Bing
author_facet Yoo, Seungyeul
Shi, Zhiao
Wen, Bo
Kho, SoonJye
Pan, Renke
Feng, Hanying
Chen, Hong
Carlsson, Anders
Edén, Patrik
Ma, Weiping
Raymer, Michael
Maier, Ezekiel J.
Tezak, Zivana
Johanson, Elaine
Hinton, Denise
Rodriguez, Henry
Zhu, Jun
Boja, Emily
Wang, Pei
Zhang, Bing
author_sort Yoo, Seungyeul
collection PubMed
description Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.
format Online
Article
Text
id pubmed-8134945
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-81349452021-05-24 A community effort to identify and correct mislabeled samples in proteogenomic studies Yoo, Seungyeul Shi, Zhiao Wen, Bo Kho, SoonJye Pan, Renke Feng, Hanying Chen, Hong Carlsson, Anders Edén, Patrik Ma, Weiping Raymer, Michael Maier, Ezekiel J. Tezak, Zivana Johanson, Elaine Hinton, Denise Rodriguez, Henry Zhu, Jun Boja, Emily Wang, Pei Zhang, Bing Patterns (N Y) Article Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets. Elsevier 2021-05-07 /pmc/articles/PMC8134945/ /pubmed/34036290 http://dx.doi.org/10.1016/j.patter.2021.100245 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Yoo, Seungyeul
Shi, Zhiao
Wen, Bo
Kho, SoonJye
Pan, Renke
Feng, Hanying
Chen, Hong
Carlsson, Anders
Edén, Patrik
Ma, Weiping
Raymer, Michael
Maier, Ezekiel J.
Tezak, Zivana
Johanson, Elaine
Hinton, Denise
Rodriguez, Henry
Zhu, Jun
Boja, Emily
Wang, Pei
Zhang, Bing
A community effort to identify and correct mislabeled samples in proteogenomic studies
title A community effort to identify and correct mislabeled samples in proteogenomic studies
title_full A community effort to identify and correct mislabeled samples in proteogenomic studies
title_fullStr A community effort to identify and correct mislabeled samples in proteogenomic studies
title_full_unstemmed A community effort to identify and correct mislabeled samples in proteogenomic studies
title_short A community effort to identify and correct mislabeled samples in proteogenomic studies
title_sort community effort to identify and correct mislabeled samples in proteogenomic studies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8134945/
https://www.ncbi.nlm.nih.gov/pubmed/34036290
http://dx.doi.org/10.1016/j.patter.2021.100245
work_keys_str_mv AT yooseungyeul acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT shizhiao acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT wenbo acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT khosoonjye acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT panrenke acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT fenghanying acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT chenhong acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT carlssonanders acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT edenpatrik acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT maweiping acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT raymermichael acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT maierezekielj acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT tezakzivana acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT johansonelaine acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT hintondenise acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT rodriguezhenry acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT zhujun acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT bojaemily acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT wangpei acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT zhangbing acommunityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT yooseungyeul communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT shizhiao communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT wenbo communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT khosoonjye communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT panrenke communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT fenghanying communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT chenhong communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT carlssonanders communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT edenpatrik communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT maweiping communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT raymermichael communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT maierezekielj communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT tezakzivana communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT johansonelaine communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT hintondenise communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT rodriguezhenry communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT zhujun communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT bojaemily communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT wangpei communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies
AT zhangbing communityefforttoidentifyandcorrectmislabeledsamplesinproteogenomicstudies