Cargando…
iProX in 2021: connecting proteomics data sharing with big data
The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been gre...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728291/ https://www.ncbi.nlm.nih.gov/pubmed/34871441 http://dx.doi.org/10.1093/nar/gkab1081 |
_version_ | 1784626706320982016 |
---|---|
author | Chen, Tao Ma, Jie Liu, Yi Chen, Zhiguang Xiao, Nong Lu, Yutong Fu, Yinjin Yang, Chunyuan Li, Mansheng Wu, Songfeng Wang, Xue Li, Dongsheng He, Fuchu Hermjakob, Henning Zhu, Yunping |
author_facet | Chen, Tao Ma, Jie Liu, Yi Chen, Zhiguang Xiao, Nong Lu, Yutong Fu, Yinjin Yang, Chunyuan Li, Mansheng Wu, Songfeng Wang, Xue Li, Dongsheng He, Fuchu Hermjakob, Henning Zhu, Yunping |
author_sort | Chen, Tao |
collection | PubMed |
description | The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics. |
format | Online Article Text |
id | pubmed-8728291 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-87282912022-01-05 iProX in 2021: connecting proteomics data sharing with big data Chen, Tao Ma, Jie Liu, Yi Chen, Zhiguang Xiao, Nong Lu, Yutong Fu, Yinjin Yang, Chunyuan Li, Mansheng Wu, Songfeng Wang, Xue Li, Dongsheng He, Fuchu Hermjakob, Henning Zhu, Yunping Nucleic Acids Res Database Issue The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics. Oxford University Press 2021-12-06 /pmc/articles/PMC8728291/ /pubmed/34871441 http://dx.doi.org/10.1093/nar/gkab1081 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Database Issue Chen, Tao Ma, Jie Liu, Yi Chen, Zhiguang Xiao, Nong Lu, Yutong Fu, Yinjin Yang, Chunyuan Li, Mansheng Wu, Songfeng Wang, Xue Li, Dongsheng He, Fuchu Hermjakob, Henning Zhu, Yunping iProX in 2021: connecting proteomics data sharing with big data |
title | iProX in 2021: connecting proteomics data sharing with big data |
title_full | iProX in 2021: connecting proteomics data sharing with big data |
title_fullStr | iProX in 2021: connecting proteomics data sharing with big data |
title_full_unstemmed | iProX in 2021: connecting proteomics data sharing with big data |
title_short | iProX in 2021: connecting proteomics data sharing with big data |
title_sort | iprox in 2021: connecting proteomics data sharing with big data |
topic | Database Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728291/ https://www.ncbi.nlm.nih.gov/pubmed/34871441 http://dx.doi.org/10.1093/nar/gkab1081 |
work_keys_str_mv | AT chentao iproxin2021connectingproteomicsdatasharingwithbigdata AT majie iproxin2021connectingproteomicsdatasharingwithbigdata AT liuyi iproxin2021connectingproteomicsdatasharingwithbigdata AT chenzhiguang iproxin2021connectingproteomicsdatasharingwithbigdata AT xiaonong iproxin2021connectingproteomicsdatasharingwithbigdata AT luyutong iproxin2021connectingproteomicsdatasharingwithbigdata AT fuyinjin iproxin2021connectingproteomicsdatasharingwithbigdata AT yangchunyuan iproxin2021connectingproteomicsdatasharingwithbigdata AT limansheng iproxin2021connectingproteomicsdatasharingwithbigdata AT wusongfeng iproxin2021connectingproteomicsdatasharingwithbigdata AT wangxue iproxin2021connectingproteomicsdatasharingwithbigdata AT lidongsheng iproxin2021connectingproteomicsdatasharingwithbigdata AT hefuchu iproxin2021connectingproteomicsdatasharingwithbigdata AT hermjakobhenning iproxin2021connectingproteomicsdatasharingwithbigdata AT zhuyunping iproxin2021connectingproteomicsdatasharingwithbigdata |