Cargando…

SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models

BACKGROUND: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregati...

Descripción completa

Detalles Bibliográficos
Autores principales: Janssen, Kobe, Duran-Romaña, Ramon, Bottu, Guy, Guharoy, Mainak, Botzki, Alexander, Rousseau, Frederic, Schymkowitz, Joost
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355034/
https://www.ncbi.nlm.nih.gov/pubmed/37464277
http://dx.doi.org/10.1186/s12859-023-05407-9
_version_ 1785075053972422656
author Janssen, Kobe
Duran-Romaña, Ramon
Bottu, Guy
Guharoy, Mainak
Botzki, Alexander
Rousseau, Frederic
Schymkowitz, Joost
author_facet Janssen, Kobe
Duran-Romaña, Ramon
Bottu, Guy
Guharoy, Mainak
Botzki, Alexander
Rousseau, Frederic
Schymkowitz, Joost
author_sort Janssen, Kobe
collection PubMed
description BACKGROUND: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregation. Multiple software tools exist to predict the effect of single-nucleotide variants on proteins, however, a pipeline integrating these tools while starting from an NGS data output list of variants is lacking. RESULTS: The previous version SNPeffect 4.0 (De Baets in Nucleic Acids Res 40(D1):D935–D939, 2011) provided an online database containing pre-calculated variant effects and low-throughput custom variant analysis. Here, we built an automated and parallelized pipeline that analyzes the impact of missense variants on the aggregation propensity and structural stability of proteins starting from the Variant Call Format as input. The pipeline incorporates the AlphaFold Protein Structure Database to achieve high coverage for structural stability analyses using the FoldX force field. The effect on aggregation-propensity is analyzed using the established predictors TANGO and WALTZ. The pipeline focuses solely on the human proteome and can be used to analyze proteome stability/damage in a given sample based on sequencing results. CONCLUSION: We provide a bioinformatics pipeline that allows structural phenotyping from sequencing data using established stability and aggregation predictors including FoldX, TANGO, and WALTZ; and structural proteome coverage provided by the AlphaFold database. The pipeline and installation guide are freely available for academic users on https://github.com/vibbits/snpeffect and requires a computer cluster. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05407-9.
format Online
Article
Text
id pubmed-10355034
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-103550342023-07-20 SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models Janssen, Kobe Duran-Romaña, Ramon Bottu, Guy Guharoy, Mainak Botzki, Alexander Rousseau, Frederic Schymkowitz, Joost BMC Bioinformatics Software BACKGROUND: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregation. Multiple software tools exist to predict the effect of single-nucleotide variants on proteins, however, a pipeline integrating these tools while starting from an NGS data output list of variants is lacking. RESULTS: The previous version SNPeffect 4.0 (De Baets in Nucleic Acids Res 40(D1):D935–D939, 2011) provided an online database containing pre-calculated variant effects and low-throughput custom variant analysis. Here, we built an automated and parallelized pipeline that analyzes the impact of missense variants on the aggregation propensity and structural stability of proteins starting from the Variant Call Format as input. The pipeline incorporates the AlphaFold Protein Structure Database to achieve high coverage for structural stability analyses using the FoldX force field. The effect on aggregation-propensity is analyzed using the established predictors TANGO and WALTZ. The pipeline focuses solely on the human proteome and can be used to analyze proteome stability/damage in a given sample based on sequencing results. CONCLUSION: We provide a bioinformatics pipeline that allows structural phenotyping from sequencing data using established stability and aggregation predictors including FoldX, TANGO, and WALTZ; and structural proteome coverage provided by the AlphaFold database. The pipeline and installation guide are freely available for academic users on https://github.com/vibbits/snpeffect and requires a computer cluster. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05407-9. BioMed Central 2023-07-18 /pmc/articles/PMC10355034/ /pubmed/37464277 http://dx.doi.org/10.1186/s12859-023-05407-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Janssen, Kobe
Duran-Romaña, Ramon
Bottu, Guy
Guharoy, Mainak
Botzki, Alexander
Rousseau, Frederic
Schymkowitz, Joost
SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
title SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
title_full SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
title_fullStr SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
title_full_unstemmed SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
title_short SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
title_sort snpeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using alphafold models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355034/
https://www.ncbi.nlm.nih.gov/pubmed/37464277
http://dx.doi.org/10.1186/s12859-023-05407-9
work_keys_str_mv AT janssenkobe snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels
AT duranromanaramon snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels
AT bottuguy snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels
AT guharoymainak snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels
AT botzkialexander snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels
AT rousseaufrederic snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels
AT schymkowitzjoost snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels