Cargando…

SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity

BACKGROUND: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Tong, Wang, Zheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909207/
https://www.ncbi.nlm.nih.gov/pubmed/29713370
http://dx.doi.org/10.1186/s13029-018-0068-7
_version_ 1783315852301434880
author Liu, Tong
Wang, Zheng
author_facet Liu, Tong
Wang, Zheng
author_sort Liu, Tong
collection PubMed
description BACKGROUND: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately. RESULTS: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine  > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance. CONCLUSIONS: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.
format Online
Article
Text
id pubmed-5909207
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59092072018-04-30 SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity Liu, Tong Wang, Zheng Source Code Biol Med Methodology BACKGROUND: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately. RESULTS: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine  > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance. CONCLUSIONS: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/. BioMed Central 2018-04-20 /pmc/articles/PMC5909207/ /pubmed/29713370 http://dx.doi.org/10.1186/s13029-018-0068-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Liu, Tong
Wang, Zheng
SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
title SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
title_full SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
title_fullStr SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
title_full_unstemmed SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
title_short SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity
title_sort sov_refine: a further refined definition of segment overlap score and its significance for protein structure similarity
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909207/
https://www.ncbi.nlm.nih.gov/pubmed/29713370
http://dx.doi.org/10.1186/s13029-018-0068-7
work_keys_str_mv AT liutong sovrefineafurtherrefineddefinitionofsegmentoverlapscoreanditssignificanceforproteinstructuresimilarity
AT wangzheng sovrefineafurtherrefineddefinitionofsegmentoverlapscoreanditssignificanceforproteinstructuresimilarity