Cargando…
Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researcher...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487607/ https://www.ncbi.nlm.nih.gov/pubmed/35988923 http://dx.doi.org/10.1093/bib/bbac343 |
_version_ | 1784792490371448832 |
---|---|
author | Sidorczuk, Katarzyna Gagat, Przemysław Pietluch, Filip Kała, Jakub Rafacz, Dominik Bąkała, Laura Słowik, Jadwiga Kolenda, Rafał Rödiger, Stefan Fingerhut, Legana C H W Cooke, Ira R Mackiewicz, Paweł Burdukiewicz, Michał |
author_facet | Sidorczuk, Katarzyna Gagat, Przemysław Pietluch, Filip Kała, Jakub Rafacz, Dominik Bąkała, Laura Słowik, Jadwiga Kolenda, Rafał Rödiger, Stefan Fingerhut, Legana C H W Cooke, Ira R Mackiewicz, Paweł Burdukiewicz, Michał |
author_sort | Sidorczuk, Katarzyna |
collection | PubMed |
description | Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMPBenchmark is available at http://BioGenies.info/AMPBenchmark. |
format | Online Article Text |
id | pubmed-9487607 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-94876072022-09-21 Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data Sidorczuk, Katarzyna Gagat, Przemysław Pietluch, Filip Kała, Jakub Rafacz, Dominik Bąkała, Laura Słowik, Jadwiga Kolenda, Rafał Rödiger, Stefan Fingerhut, Legana C H W Cooke, Ira R Mackiewicz, Paweł Burdukiewicz, Michał Brief Bioinform Problem Solving Protocol Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMPBenchmark is available at http://BioGenies.info/AMPBenchmark. Oxford University Press 2022-08-21 /pmc/articles/PMC9487607/ /pubmed/35988923 http://dx.doi.org/10.1093/bib/bbac343 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Sidorczuk, Katarzyna Gagat, Przemysław Pietluch, Filip Kała, Jakub Rafacz, Dominik Bąkała, Laura Słowik, Jadwiga Kolenda, Rafał Rödiger, Stefan Fingerhut, Legana C H W Cooke, Ira R Mackiewicz, Paweł Burdukiewicz, Michał Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
title | Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
title_full | Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
title_fullStr | Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
title_full_unstemmed | Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
title_short | Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
title_sort | benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487607/ https://www.ncbi.nlm.nih.gov/pubmed/35988923 http://dx.doi.org/10.1093/bib/bbac343 |
work_keys_str_mv | AT sidorczukkatarzyna benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT gagatprzemysław benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT pietluchfilip benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT kałajakub benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT rafaczdominik benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT bakałalaura benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT słowikjadwiga benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT kolendarafał benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT rodigerstefan benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT fingerhutleganachw benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT cookeirar benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT mackiewiczpaweł benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata AT burdukiewiczmichał benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata |