Cargando…

Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data

Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researcher...

Descripción completa

Detalles Bibliográficos
Autores principales: Sidorczuk, Katarzyna, Gagat, Przemysław, Pietluch, Filip, Kała, Jakub, Rafacz, Dominik, Bąkała, Laura, Słowik, Jadwiga, Kolenda, Rafał, Rödiger, Stefan, Fingerhut, Legana C H W, Cooke, Ira R, Mackiewicz, Paweł, Burdukiewicz, Michał
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487607/
https://www.ncbi.nlm.nih.gov/pubmed/35988923
http://dx.doi.org/10.1093/bib/bbac343
_version_ 1784792490371448832
author Sidorczuk, Katarzyna
Gagat, Przemysław
Pietluch, Filip
Kała, Jakub
Rafacz, Dominik
Bąkała, Laura
Słowik, Jadwiga
Kolenda, Rafał
Rödiger, Stefan
Fingerhut, Legana C H W
Cooke, Ira R
Mackiewicz, Paweł
Burdukiewicz, Michał
author_facet Sidorczuk, Katarzyna
Gagat, Przemysław
Pietluch, Filip
Kała, Jakub
Rafacz, Dominik
Bąkała, Laura
Słowik, Jadwiga
Kolenda, Rafał
Rödiger, Stefan
Fingerhut, Legana C H W
Cooke, Ira R
Mackiewicz, Paweł
Burdukiewicz, Michał
author_sort Sidorczuk, Katarzyna
collection PubMed
description Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMPBenchmark is available at http://BioGenies.info/AMPBenchmark.
format Online
Article
Text
id pubmed-9487607
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94876072022-09-21 Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data Sidorczuk, Katarzyna Gagat, Przemysław Pietluch, Filip Kała, Jakub Rafacz, Dominik Bąkała, Laura Słowik, Jadwiga Kolenda, Rafał Rödiger, Stefan Fingerhut, Legana C H W Cooke, Ira R Mackiewicz, Paweł Burdukiewicz, Michał Brief Bioinform Problem Solving Protocol Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMPBenchmark is available at http://BioGenies.info/AMPBenchmark. Oxford University Press 2022-08-21 /pmc/articles/PMC9487607/ /pubmed/35988923 http://dx.doi.org/10.1093/bib/bbac343 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Sidorczuk, Katarzyna
Gagat, Przemysław
Pietluch, Filip
Kała, Jakub
Rafacz, Dominik
Bąkała, Laura
Słowik, Jadwiga
Kolenda, Rafał
Rödiger, Stefan
Fingerhut, Legana C H W
Cooke, Ira R
Mackiewicz, Paweł
Burdukiewicz, Michał
Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
title Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
title_full Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
title_fullStr Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
title_full_unstemmed Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
title_short Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
title_sort benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487607/
https://www.ncbi.nlm.nih.gov/pubmed/35988923
http://dx.doi.org/10.1093/bib/bbac343
work_keys_str_mv AT sidorczukkatarzyna benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT gagatprzemysław benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT pietluchfilip benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT kałajakub benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT rafaczdominik benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT bakałalaura benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT słowikjadwiga benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT kolendarafał benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT rodigerstefan benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT fingerhutleganachw benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT cookeirar benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT mackiewiczpaweł benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata
AT burdukiewiczmichał benchmarksinantimicrobialpeptidepredictionarebiasedduetotheselectionofnegativedata