Cargando…

Petascale Homology Search for Structure Prediction

The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Se...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sewon, Kim, Gyuri, Karin, Eli Levy, Mirdita, Milot, Park, Sukhwan, Chikhi, Rayan, Babaian, Artem, Kryshtafovych, Andriy, Steinegger, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369885/
https://www.ncbi.nlm.nih.gov/pubmed/37503235
http://dx.doi.org/10.1101/2023.07.10.548308
_version_ 1785077855314509824
author Lee, Sewon
Kim, Gyuri
Karin, Eli Levy
Mirdita, Milot
Park, Sukhwan
Chikhi, Rayan
Babaian, Artem
Kryshtafovych, Andriy
Steinegger, Martin
author_facet Lee, Sewon
Kim, Gyuri
Karin, Eli Levy
Mirdita, Milot
Park, Sukhwan
Chikhi, Rayan
Babaian, Artem
Kryshtafovych, Andriy
Steinegger, Martin
author_sort Lee, Sewon
collection PubMed
description The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold’s advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold’s CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
format Online
Article
Text
id pubmed-10369885
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-103698852023-07-27 Petascale Homology Search for Structure Prediction Lee, Sewon Kim, Gyuri Karin, Eli Levy Mirdita, Milot Park, Sukhwan Chikhi, Rayan Babaian, Artem Kryshtafovych, Andriy Steinegger, Martin bioRxiv Article The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold’s advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold’s CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction. Cold Spring Harbor Laboratory 2023-07-11 /pmc/articles/PMC10369885/ /pubmed/37503235 http://dx.doi.org/10.1101/2023.07.10.548308 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Lee, Sewon
Kim, Gyuri
Karin, Eli Levy
Mirdita, Milot
Park, Sukhwan
Chikhi, Rayan
Babaian, Artem
Kryshtafovych, Andriy
Steinegger, Martin
Petascale Homology Search for Structure Prediction
title Petascale Homology Search for Structure Prediction
title_full Petascale Homology Search for Structure Prediction
title_fullStr Petascale Homology Search for Structure Prediction
title_full_unstemmed Petascale Homology Search for Structure Prediction
title_short Petascale Homology Search for Structure Prediction
title_sort petascale homology search for structure prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369885/
https://www.ncbi.nlm.nih.gov/pubmed/37503235
http://dx.doi.org/10.1101/2023.07.10.548308
work_keys_str_mv AT leesewon petascalehomologysearchforstructureprediction
AT kimgyuri petascalehomologysearchforstructureprediction
AT karinelilevy petascalehomologysearchforstructureprediction
AT mirditamilot petascalehomologysearchforstructureprediction
AT parksukhwan petascalehomologysearchforstructureprediction
AT chikhirayan petascalehomologysearchforstructureprediction
AT babaianartem petascalehomologysearchforstructureprediction
AT kryshtafovychandriy petascalehomologysearchforstructureprediction
AT steineggermartin petascalehomologysearchforstructureprediction