Cargando…
SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
BACKGROUND: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909207/ https://www.ncbi.nlm.nih.gov/pubmed/29713370 http://dx.doi.org/10.1186/s13029-018-0068-7 |
_version_ | 1783315852301434880 |
---|---|
author | Liu, Tong Wang, Zheng |
author_facet | Liu, Tong Wang, Zheng |
author_sort | Liu, Tong |
collection | PubMed |
description | BACKGROUND: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately. RESULTS: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance. CONCLUSIONS: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/. |
format | Online Article Text |
id | pubmed-5909207 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-59092072018-04-30 SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity Liu, Tong Wang, Zheng Source Code Biol Med Methodology BACKGROUND: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately. RESULTS: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance. CONCLUSIONS: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/. BioMed Central 2018-04-20 /pmc/articles/PMC5909207/ /pubmed/29713370 http://dx.doi.org/10.1186/s13029-018-0068-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Liu, Tong Wang, Zheng SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity |
title | SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity |
title_full | SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity |
title_fullStr | SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity |
title_full_unstemmed | SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity |
title_short | SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity |
title_sort | sov_refine: a further refined definition of segment overlap score and its significance for protein structure similarity |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909207/ https://www.ncbi.nlm.nih.gov/pubmed/29713370 http://dx.doi.org/10.1186/s13029-018-0068-7 |
work_keys_str_mv | AT liutong sovrefineafurtherrefineddefinitionofsegmentoverlapscoreanditssignificanceforproteinstructuresimilarity AT wangzheng sovrefineafurtherrefineddefinitionofsegmentoverlapscoreanditssignificanceforproteinstructuresimilarity |