Cargando…
Caretta – A multiple protein structure alignment and feature extraction suite
The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input f...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186369/ https://www.ncbi.nlm.nih.gov/pubmed/32368333 http://dx.doi.org/10.1016/j.csbj.2020.03.011 |
_version_ | 1783526932817641472 |
---|---|
author | Akdel, Mehmet Durairaj, Janani de Ridder, Dick van Dijk, Aalt D.J. |
author_facet | Akdel, Mehmet Durairaj, Janani de Ridder, Dick van Dijk, Aalt D.J. |
author_sort | Akdel, Mehmet |
collection | PubMed |
description | The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases. |
format | Online Article Text |
id | pubmed-7186369 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-71863692020-05-04 Caretta – A multiple protein structure alignment and feature extraction suite Akdel, Mehmet Durairaj, Janani de Ridder, Dick van Dijk, Aalt D.J. Comput Struct Biotechnol J Research Article The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases. Research Network of Computational and Structural Biotechnology 2020-04-06 /pmc/articles/PMC7186369/ /pubmed/32368333 http://dx.doi.org/10.1016/j.csbj.2020.03.011 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Akdel, Mehmet Durairaj, Janani de Ridder, Dick van Dijk, Aalt D.J. Caretta – A multiple protein structure alignment and feature extraction suite |
title | Caretta – A multiple protein structure alignment and feature extraction suite |
title_full | Caretta – A multiple protein structure alignment and feature extraction suite |
title_fullStr | Caretta – A multiple protein structure alignment and feature extraction suite |
title_full_unstemmed | Caretta – A multiple protein structure alignment and feature extraction suite |
title_short | Caretta – A multiple protein structure alignment and feature extraction suite |
title_sort | caretta – a multiple protein structure alignment and feature extraction suite |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186369/ https://www.ncbi.nlm.nih.gov/pubmed/32368333 http://dx.doi.org/10.1016/j.csbj.2020.03.011 |
work_keys_str_mv | AT akdelmehmet carettaamultipleproteinstructurealignmentandfeatureextractionsuite AT durairajjanani carettaamultipleproteinstructurealignmentandfeatureextractionsuite AT deridderdick carettaamultipleproteinstructurealignmentandfeatureextractionsuite AT vandijkaaltdj carettaamultipleproteinstructurealignmentandfeatureextractionsuite |