Cargando…

mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data

Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These ‘dirty data’ problems increase the difficulty of performing MS analyses beca...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Geunho, Lee, Hyun Beom, Jung, Byung Hwa, Nam, Hojung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5494294/
https://www.ncbi.nlm.nih.gov/pubmed/28680817
http://dx.doi.org/10.1002/2211-5463.12247
_version_ 1783247650136522752
author Lee, Geunho
Lee, Hyun Beom
Jung, Byung Hwa
Nam, Hojung
author_facet Lee, Geunho
Lee, Hyun Beom
Jung, Byung Hwa
Nam, Hojung
author_sort Lee, Geunho
collection PubMed
description Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These ‘dirty data’ problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine‐learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open‐source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass‐to‐charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp‐applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.
format Online
Article
Text
id pubmed-5494294
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-54942942017-07-05 mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data Lee, Geunho Lee, Hyun Beom Jung, Byung Hwa Nam, Hojung FEBS Open Bio Method Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These ‘dirty data’ problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine‐learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open‐source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass‐to‐charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp‐applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp. John Wiley and Sons Inc. 2017-06-19 /pmc/articles/PMC5494294/ /pubmed/28680817 http://dx.doi.org/10.1002/2211-5463.12247 Text en © 2017 The Authors. Published by FEBS Press and John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method
Lee, Geunho
Lee, Hyun Beom
Jung, Byung Hwa
Nam, Hojung
mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
title mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
title_full mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
title_fullStr mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
title_full_unstemmed mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
title_short mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
title_sort mvp – an open‐source preprocessor for cleaning duplicate records and missing values in mass spectrometry data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5494294/
https://www.ncbi.nlm.nih.gov/pubmed/28680817
http://dx.doi.org/10.1002/2211-5463.12247
work_keys_str_mv AT leegeunho mvpanopensourcepreprocessorforcleaningduplicaterecordsandmissingvaluesinmassspectrometrydata
AT leehyunbeom mvpanopensourcepreprocessorforcleaningduplicaterecordsandmissingvaluesinmassspectrometrydata
AT jungbyunghwa mvpanopensourcepreprocessorforcleaningduplicaterecordsandmissingvaluesinmassspectrometrydata
AT namhojung mvpanopensourcepreprocessorforcleaningduplicaterecordsandmissingvaluesinmassspectrometrydata