Cargando…

Robust classification of protein variation using structural modelling and large-scale data integration

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates...

Descripción completa

Detalles Bibliográficos
Autores principales: Baugh, Evan H., Simmons-Edler, Riley, Müller, Christian L., Alford, Rebecca F., Volfovsky, Natalia, Lash, Alex E., Bonneau, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4824117/
https://www.ncbi.nlm.nih.gov/pubmed/26926108
http://dx.doi.org/10.1093/nar/gkw120
_version_ 1782426044491366400
author Baugh, Evan H.
Simmons-Edler, Riley
Müller, Christian L.
Alford, Rebecca F.
Volfovsky, Natalia
Lash, Alex E.
Bonneau, Richard
author_facet Baugh, Evan H.
Simmons-Edler, Riley
Müller, Christian L.
Alford, Rebecca F.
Volfovsky, Natalia
Lash, Alex E.
Bonneau, Richard
author_sort Baugh, Evan H.
collection PubMed
description Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.
format Online
Article
Text
id pubmed-4824117
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48241172016-04-08 Robust classification of protein variation using structural modelling and large-scale data integration Baugh, Evan H. Simmons-Edler, Riley Müller, Christian L. Alford, Rebecca F. Volfovsky, Natalia Lash, Alex E. Bonneau, Richard Nucleic Acids Res Computational Biology Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders. Oxford University Press 2016-04-07 2016-02-28 /pmc/articles/PMC4824117/ /pubmed/26926108 http://dx.doi.org/10.1093/nar/gkw120 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Baugh, Evan H.
Simmons-Edler, Riley
Müller, Christian L.
Alford, Rebecca F.
Volfovsky, Natalia
Lash, Alex E.
Bonneau, Richard
Robust classification of protein variation using structural modelling and large-scale data integration
title Robust classification of protein variation using structural modelling and large-scale data integration
title_full Robust classification of protein variation using structural modelling and large-scale data integration
title_fullStr Robust classification of protein variation using structural modelling and large-scale data integration
title_full_unstemmed Robust classification of protein variation using structural modelling and large-scale data integration
title_short Robust classification of protein variation using structural modelling and large-scale data integration
title_sort robust classification of protein variation using structural modelling and large-scale data integration
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4824117/
https://www.ncbi.nlm.nih.gov/pubmed/26926108
http://dx.doi.org/10.1093/nar/gkw120
work_keys_str_mv AT baughevanh robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration
AT simmonsedlerriley robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration
AT mullerchristianl robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration
AT alfordrebeccaf robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration
AT volfovskynatalia robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration
AT lashalexe robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration
AT bonneaurichard robustclassificationofproteinvariationusingstructuralmodellingandlargescaledataintegration