Cargando…
SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models
BACKGROUND: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregati...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355034/ https://www.ncbi.nlm.nih.gov/pubmed/37464277 http://dx.doi.org/10.1186/s12859-023-05407-9 |
_version_ | 1785075053972422656 |
---|---|
author | Janssen, Kobe Duran-Romaña, Ramon Bottu, Guy Guharoy, Mainak Botzki, Alexander Rousseau, Frederic Schymkowitz, Joost |
author_facet | Janssen, Kobe Duran-Romaña, Ramon Bottu, Guy Guharoy, Mainak Botzki, Alexander Rousseau, Frederic Schymkowitz, Joost |
author_sort | Janssen, Kobe |
collection | PubMed |
description | BACKGROUND: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregation. Multiple software tools exist to predict the effect of single-nucleotide variants on proteins, however, a pipeline integrating these tools while starting from an NGS data output list of variants is lacking. RESULTS: The previous version SNPeffect 4.0 (De Baets in Nucleic Acids Res 40(D1):D935–D939, 2011) provided an online database containing pre-calculated variant effects and low-throughput custom variant analysis. Here, we built an automated and parallelized pipeline that analyzes the impact of missense variants on the aggregation propensity and structural stability of proteins starting from the Variant Call Format as input. The pipeline incorporates the AlphaFold Protein Structure Database to achieve high coverage for structural stability analyses using the FoldX force field. The effect on aggregation-propensity is analyzed using the established predictors TANGO and WALTZ. The pipeline focuses solely on the human proteome and can be used to analyze proteome stability/damage in a given sample based on sequencing results. CONCLUSION: We provide a bioinformatics pipeline that allows structural phenotyping from sequencing data using established stability and aggregation predictors including FoldX, TANGO, and WALTZ; and structural proteome coverage provided by the AlphaFold database. The pipeline and installation guide are freely available for academic users on https://github.com/vibbits/snpeffect and requires a computer cluster. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05407-9. |
format | Online Article Text |
id | pubmed-10355034 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-103550342023-07-20 SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models Janssen, Kobe Duran-Romaña, Ramon Bottu, Guy Guharoy, Mainak Botzki, Alexander Rousseau, Frederic Schymkowitz, Joost BMC Bioinformatics Software BACKGROUND: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregation. Multiple software tools exist to predict the effect of single-nucleotide variants on proteins, however, a pipeline integrating these tools while starting from an NGS data output list of variants is lacking. RESULTS: The previous version SNPeffect 4.0 (De Baets in Nucleic Acids Res 40(D1):D935–D939, 2011) provided an online database containing pre-calculated variant effects and low-throughput custom variant analysis. Here, we built an automated and parallelized pipeline that analyzes the impact of missense variants on the aggregation propensity and structural stability of proteins starting from the Variant Call Format as input. The pipeline incorporates the AlphaFold Protein Structure Database to achieve high coverage for structural stability analyses using the FoldX force field. The effect on aggregation-propensity is analyzed using the established predictors TANGO and WALTZ. The pipeline focuses solely on the human proteome and can be used to analyze proteome stability/damage in a given sample based on sequencing results. CONCLUSION: We provide a bioinformatics pipeline that allows structural phenotyping from sequencing data using established stability and aggregation predictors including FoldX, TANGO, and WALTZ; and structural proteome coverage provided by the AlphaFold database. The pipeline and installation guide are freely available for academic users on https://github.com/vibbits/snpeffect and requires a computer cluster. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05407-9. BioMed Central 2023-07-18 /pmc/articles/PMC10355034/ /pubmed/37464277 http://dx.doi.org/10.1186/s12859-023-05407-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Janssen, Kobe Duran-Romaña, Ramon Bottu, Guy Guharoy, Mainak Botzki, Alexander Rousseau, Frederic Schymkowitz, Joost SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models |
title | SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models |
title_full | SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models |
title_fullStr | SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models |
title_full_unstemmed | SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models |
title_short | SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models |
title_sort | snpeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using alphafold models |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355034/ https://www.ncbi.nlm.nih.gov/pubmed/37464277 http://dx.doi.org/10.1186/s12859-023-05407-9 |
work_keys_str_mv | AT janssenkobe snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels AT duranromanaramon snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels AT bottuguy snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels AT guharoymainak snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels AT botzkialexander snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels AT rousseaufrederic snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels AT schymkowitzjoost snpeffect50largescalestructuralphenotypingofproteincodingvariantsextractedfromnextgenerationsequencingdatausingalphafoldmodels |