Cargando…

Caretta – A multiple protein structure alignment and feature extraction suite

The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input f...

Descripción completa

Detalles Bibliográficos
Autores principales: Akdel, Mehmet, Durairaj, Janani, de Ridder, Dick, van Dijk, Aalt D.J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186369/
https://www.ncbi.nlm.nih.gov/pubmed/32368333
http://dx.doi.org/10.1016/j.csbj.2020.03.011
_version_ 1783526932817641472
author Akdel, Mehmet
Durairaj, Janani
de Ridder, Dick
van Dijk, Aalt D.J.
author_facet Akdel, Mehmet
Durairaj, Janani
de Ridder, Dick
van Dijk, Aalt D.J.
author_sort Akdel, Mehmet
collection PubMed
description The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.
format Online
Article
Text
id pubmed-7186369
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-71863692020-05-04 Caretta – A multiple protein structure alignment and feature extraction suite Akdel, Mehmet Durairaj, Janani de Ridder, Dick van Dijk, Aalt D.J. Comput Struct Biotechnol J Research Article The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases. Research Network of Computational and Structural Biotechnology 2020-04-06 /pmc/articles/PMC7186369/ /pubmed/32368333 http://dx.doi.org/10.1016/j.csbj.2020.03.011 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Akdel, Mehmet
Durairaj, Janani
de Ridder, Dick
van Dijk, Aalt D.J.
Caretta – A multiple protein structure alignment and feature extraction suite
title Caretta – A multiple protein structure alignment and feature extraction suite
title_full Caretta – A multiple protein structure alignment and feature extraction suite
title_fullStr Caretta – A multiple protein structure alignment and feature extraction suite
title_full_unstemmed Caretta – A multiple protein structure alignment and feature extraction suite
title_short Caretta – A multiple protein structure alignment and feature extraction suite
title_sort caretta – a multiple protein structure alignment and feature extraction suite
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186369/
https://www.ncbi.nlm.nih.gov/pubmed/32368333
http://dx.doi.org/10.1016/j.csbj.2020.03.011
work_keys_str_mv AT akdelmehmet carettaamultipleproteinstructurealignmentandfeatureextractionsuite
AT durairajjanani carettaamultipleproteinstructurealignmentandfeatureextractionsuite
AT deridderdick carettaamultipleproteinstructurealignmentandfeatureextractionsuite
AT vandijkaaltdj carettaamultipleproteinstructurealignmentandfeatureextractionsuite