Cargando…

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D

Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challeng...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Minglei, Zhang, Wenliang, Yao, Guocai, Zhang, Haiyue, Li, Weizhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637259/
https://www.ncbi.nlm.nih.gov/pubmed/31317184
http://dx.doi.org/10.1093/database/baz092
_version_ 1783436206905753600
author Yang, Minglei
Zhang, Wenliang
Yao, Guocai
Zhang, Haiyue
Li, Weizhong
author_facet Yang, Minglei
Zhang, Wenliang
Yao, Guocai
Zhang, Haiyue
Li, Weizhong
author_sort Yang, Minglei
collection PubMed
description Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search.
format Online
Article
Text
id pubmed-6637259
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66372592019-07-22 Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D Yang, Minglei Zhang, Wenliang Yao, Guocai Zhang, Haiyue Li, Weizhong Database (Oxford) Original Article Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search. Oxford University Press 2019-07-17 /pmc/articles/PMC6637259/ /pubmed/31317184 http://dx.doi.org/10.1093/database/baz092 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Yang, Minglei
Zhang, Wenliang
Yao, Guocai
Zhang, Haiyue
Li, Weizhong
Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
title Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
title_full Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
title_fullStr Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
title_full_unstemmed Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
title_short Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D
title_sort combined alignments of sequences and domains characterize unknown proteins with remotely related protein search psisearch2d
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637259/
https://www.ncbi.nlm.nih.gov/pubmed/31317184
http://dx.doi.org/10.1093/database/baz092
work_keys_str_mv AT yangminglei combinedalignmentsofsequencesanddomainscharacterizeunknownproteinswithremotelyrelatedproteinsearchpsisearch2d
AT zhangwenliang combinedalignmentsofsequencesanddomainscharacterizeunknownproteinswithremotelyrelatedproteinsearchpsisearch2d
AT yaoguocai combinedalignmentsofsequencesanddomainscharacterizeunknownproteinswithremotelyrelatedproteinsearchpsisearch2d
AT zhanghaiyue combinedalignmentsofsequencesanddomainscharacterizeunknownproteinswithremotelyrelatedproteinsearchpsisearch2d
AT liweizhong combinedalignmentsofsequencesanddomainscharacterizeunknownproteinswithremotelyrelatedproteinsearchpsisearch2d