Cargando…

iProX in 2021: connecting proteomics data sharing with big data

The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been gre...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Tao, Ma, Jie, Liu, Yi, Chen, Zhiguang, Xiao, Nong, Lu, Yutong, Fu, Yinjin, Yang, Chunyuan, Li, Mansheng, Wu, Songfeng, Wang, Xue, Li, Dongsheng, He, Fuchu, Hermjakob, Henning, Zhu, Yunping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728291/
https://www.ncbi.nlm.nih.gov/pubmed/34871441
http://dx.doi.org/10.1093/nar/gkab1081
_version_ 1784626706320982016
author Chen, Tao
Ma, Jie
Liu, Yi
Chen, Zhiguang
Xiao, Nong
Lu, Yutong
Fu, Yinjin
Yang, Chunyuan
Li, Mansheng
Wu, Songfeng
Wang, Xue
Li, Dongsheng
He, Fuchu
Hermjakob, Henning
Zhu, Yunping
author_facet Chen, Tao
Ma, Jie
Liu, Yi
Chen, Zhiguang
Xiao, Nong
Lu, Yutong
Fu, Yinjin
Yang, Chunyuan
Li, Mansheng
Wu, Songfeng
Wang, Xue
Li, Dongsheng
He, Fuchu
Hermjakob, Henning
Zhu, Yunping
author_sort Chen, Tao
collection PubMed
description The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.
format Online
Article
Text
id pubmed-8728291
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87282912022-01-05 iProX in 2021: connecting proteomics data sharing with big data Chen, Tao Ma, Jie Liu, Yi Chen, Zhiguang Xiao, Nong Lu, Yutong Fu, Yinjin Yang, Chunyuan Li, Mansheng Wu, Songfeng Wang, Xue Li, Dongsheng He, Fuchu Hermjakob, Henning Zhu, Yunping Nucleic Acids Res Database Issue The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics. Oxford University Press 2021-12-06 /pmc/articles/PMC8728291/ /pubmed/34871441 http://dx.doi.org/10.1093/nar/gkab1081 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Database Issue
Chen, Tao
Ma, Jie
Liu, Yi
Chen, Zhiguang
Xiao, Nong
Lu, Yutong
Fu, Yinjin
Yang, Chunyuan
Li, Mansheng
Wu, Songfeng
Wang, Xue
Li, Dongsheng
He, Fuchu
Hermjakob, Henning
Zhu, Yunping
iProX in 2021: connecting proteomics data sharing with big data
title iProX in 2021: connecting proteomics data sharing with big data
title_full iProX in 2021: connecting proteomics data sharing with big data
title_fullStr iProX in 2021: connecting proteomics data sharing with big data
title_full_unstemmed iProX in 2021: connecting proteomics data sharing with big data
title_short iProX in 2021: connecting proteomics data sharing with big data
title_sort iprox in 2021: connecting proteomics data sharing with big data
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728291/
https://www.ncbi.nlm.nih.gov/pubmed/34871441
http://dx.doi.org/10.1093/nar/gkab1081
work_keys_str_mv AT chentao iproxin2021connectingproteomicsdatasharingwithbigdata
AT majie iproxin2021connectingproteomicsdatasharingwithbigdata
AT liuyi iproxin2021connectingproteomicsdatasharingwithbigdata
AT chenzhiguang iproxin2021connectingproteomicsdatasharingwithbigdata
AT xiaonong iproxin2021connectingproteomicsdatasharingwithbigdata
AT luyutong iproxin2021connectingproteomicsdatasharingwithbigdata
AT fuyinjin iproxin2021connectingproteomicsdatasharingwithbigdata
AT yangchunyuan iproxin2021connectingproteomicsdatasharingwithbigdata
AT limansheng iproxin2021connectingproteomicsdatasharingwithbigdata
AT wusongfeng iproxin2021connectingproteomicsdatasharingwithbigdata
AT wangxue iproxin2021connectingproteomicsdatasharingwithbigdata
AT lidongsheng iproxin2021connectingproteomicsdatasharingwithbigdata
AT hefuchu iproxin2021connectingproteomicsdatasharingwithbigdata
AT hermjakobhenning iproxin2021connectingproteomicsdatasharingwithbigdata
AT zhuyunping iproxin2021connectingproteomicsdatasharingwithbigdata