Cargando…

Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis

BACKGROUND: Proteomics may help to detect subtle pollution-related changes, such as responses to mixture pollution at low concentrations, where clear signs of toxicity are absent. The challenges associated with the analysis of large-scale multivariate proteomic datasets have been widely discussed in...

Descripción completa

Detalles Bibliográficos
Autores principales: Monsinjon, Tiphaine, Andersen, Odd Ketil, Leboulenger, François, Knigge, Thomas
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1592071/
https://www.ncbi.nlm.nih.gov/pubmed/16970821
http://dx.doi.org/10.1186/1477-5956-4-17
_version_ 1782130368851214336
author Monsinjon, Tiphaine
Andersen, Odd Ketil
Leboulenger, François
Knigge, Thomas
author_facet Monsinjon, Tiphaine
Andersen, Odd Ketil
Leboulenger, François
Knigge, Thomas
author_sort Monsinjon, Tiphaine
collection PubMed
description BACKGROUND: Proteomics may help to detect subtle pollution-related changes, such as responses to mixture pollution at low concentrations, where clear signs of toxicity are absent. The challenges associated with the analysis of large-scale multivariate proteomic datasets have been widely discussed in medical research and biomarker discovery. This concept has been introduced to ecotoxicology only recently, so data processing and classification analysis need to be refined before they can be readily applied in biomarker discovery and monitoring studies. RESULTS: Data sets obtained from a case study of oil pollution in the Blue mussel were investigated for differential protein expression by retentate chromatography-mass spectrometry and decision tree classification. Different tissues and different settings were used to evaluate classifiers towards their discriminatory power. It was found that, due the intrinsic variability of the data sets, reliable classification of unknown samples could only be achieved on a broad statistical basis (n > 60) with the observed expression changes comprising high statistical significance and sufficient amplitude. The application of stringent criteria to guard against overfitting of the models eventually allowed satisfactory classification for only one of the investigated data sets and settings. CONCLUSION: Machine learning techniques provide a promising approach to process and extract informative expression signatures from high-dimensional mass-spectrometry data. Even though characterisation of the proteins forming the expression signatures would be ideal, knowledge of the specific proteins is not mandatory for effective class discrimination. This may constitute a new biomarker approach in ecotoxicology, where working with organisms, which do not have sequenced genomes render protein identification by database searching problematic. However, data processing has to be critically evaluated and statistical constraints have to be considered before supervised classification algorithms are employed.
format Text
id pubmed-1592071
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15920712006-10-05 Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis Monsinjon, Tiphaine Andersen, Odd Ketil Leboulenger, François Knigge, Thomas Proteome Sci Methodology BACKGROUND: Proteomics may help to detect subtle pollution-related changes, such as responses to mixture pollution at low concentrations, where clear signs of toxicity are absent. The challenges associated with the analysis of large-scale multivariate proteomic datasets have been widely discussed in medical research and biomarker discovery. This concept has been introduced to ecotoxicology only recently, so data processing and classification analysis need to be refined before they can be readily applied in biomarker discovery and monitoring studies. RESULTS: Data sets obtained from a case study of oil pollution in the Blue mussel were investigated for differential protein expression by retentate chromatography-mass spectrometry and decision tree classification. Different tissues and different settings were used to evaluate classifiers towards their discriminatory power. It was found that, due the intrinsic variability of the data sets, reliable classification of unknown samples could only be achieved on a broad statistical basis (n > 60) with the observed expression changes comprising high statistical significance and sufficient amplitude. The application of stringent criteria to guard against overfitting of the models eventually allowed satisfactory classification for only one of the investigated data sets and settings. CONCLUSION: Machine learning techniques provide a promising approach to process and extract informative expression signatures from high-dimensional mass-spectrometry data. Even though characterisation of the proteins forming the expression signatures would be ideal, knowledge of the specific proteins is not mandatory for effective class discrimination. This may constitute a new biomarker approach in ecotoxicology, where working with organisms, which do not have sequenced genomes render protein identification by database searching problematic. However, data processing has to be critically evaluated and statistical constraints have to be considered before supervised classification algorithms are employed. BioMed Central 2006-09-13 /pmc/articles/PMC1592071/ /pubmed/16970821 http://dx.doi.org/10.1186/1477-5956-4-17 Text en Copyright © 2006 Monsinjon et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Monsinjon, Tiphaine
Andersen, Odd Ketil
Leboulenger, François
Knigge, Thomas
Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis
title Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis
title_full Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis
title_fullStr Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis
title_full_unstemmed Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis
title_short Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis
title_sort data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, mytilus edulis
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1592071/
https://www.ncbi.nlm.nih.gov/pubmed/16970821
http://dx.doi.org/10.1186/1477-5956-4-17
work_keys_str_mv AT monsinjontiphaine dataprocessingandclassificationanalysisofproteomicchangesacasestudyofoilpollutioninthemusselmytilusedulis
AT andersenoddketil dataprocessingandclassificationanalysisofproteomicchangesacasestudyofoilpollutioninthemusselmytilusedulis
AT leboulengerfrancois dataprocessingandclassificationanalysisofproteomicchangesacasestudyofoilpollutioninthemusselmytilusedulis
AT kniggethomas dataprocessingandclassificationanalysisofproteomicchangesacasestudyofoilpollutioninthemusselmytilusedulis