Cargando…

The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms

Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area o...

Descripción completa

Detalles Bibliográficos
Autores principales: McDonnell, Kevin, Howley, Enda, Abram, Florence
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8956878/
https://www.ncbi.nlm.nih.gov/pubmed/35386104
http://dx.doi.org/10.1016/j.csbj.2022.03.008
_version_ 1784676649348890624
author McDonnell, Kevin
Howley, Enda
Abram, Florence
author_facet McDonnell, Kevin
Howley, Enda
Abram, Florence
author_sort McDonnell, Kevin
collection PubMed
description Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
format Online
Article
Text
id pubmed-8956878
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-89568782022-04-05 The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms McDonnell, Kevin Howley, Enda Abram, Florence Comput Struct Biotechnol J Research Article Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations. Research Network of Computational and Structural Biotechnology 2022-03-19 /pmc/articles/PMC8956878/ /pubmed/35386104 http://dx.doi.org/10.1016/j.csbj.2022.03.008 Text en © 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
McDonnell, Kevin
Howley, Enda
Abram, Florence
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
title The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
title_full The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
title_fullStr The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
title_full_unstemmed The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
title_short The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
title_sort impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8956878/
https://www.ncbi.nlm.nih.gov/pubmed/35386104
http://dx.doi.org/10.1016/j.csbj.2022.03.008
work_keys_str_mv AT mcdonnellkevin theimpactofnoiseandmissingfragmentationcleavagesondenovopeptideidentificationalgorithms
AT howleyenda theimpactofnoiseandmissingfragmentationcleavagesondenovopeptideidentificationalgorithms
AT abramflorence theimpactofnoiseandmissingfragmentationcleavagesondenovopeptideidentificationalgorithms
AT mcdonnellkevin impactofnoiseandmissingfragmentationcleavagesondenovopeptideidentificationalgorithms
AT howleyenda impactofnoiseandmissingfragmentationcleavagesondenovopeptideidentificationalgorithms
AT abramflorence impactofnoiseandmissingfragmentationcleavagesondenovopeptideidentificationalgorithms