Cargando…

IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions

MOTIVATION: Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jing-Bo, Xiong, Yao, An, Ke, Ye, Zhi-Qiang, Wu, Yun-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7755418/
https://www.ncbi.nlm.nih.gov/pubmed/32756939
http://dx.doi.org/10.1093/bioinformatics/btaa618
_version_ 1783626351570321408
author Zhou, Jing-Bo
Xiong, Yao
An, Ke
Ye, Zhi-Qiang
Wu, Yun-Dong
author_facet Zhou, Jing-Bo
Xiong, Yao
An, Ke
Ye, Zhi-Qiang
Wu, Yun-Dong
author_sort Zhou, Jing-Bo
collection PubMed
description MOTIVATION: Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. RESULTS: We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. AVAILABILITY AND IMPLEMENTATION: The software is freely available at http://www.wdspdb.com/IDRMutPred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7755418
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77554182020-12-29 IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions Zhou, Jing-Bo Xiong, Yao An, Ke Ye, Zhi-Qiang Wu, Yun-Dong Bioinformatics Original Papers MOTIVATION: Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. RESULTS: We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. AVAILABILITY AND IMPLEMENTATION: The software is freely available at http://www.wdspdb.com/IDRMutPred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-08-05 /pmc/articles/PMC7755418/ /pubmed/32756939 http://dx.doi.org/10.1093/bioinformatics/btaa618 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Zhou, Jing-Bo
Xiong, Yao
An, Ke
Ye, Zhi-Qiang
Wu, Yun-Dong
IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions
title IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions
title_full IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions
title_fullStr IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions
title_full_unstemmed IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions
title_short IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions
title_sort idrmutpred: predicting disease-associated germline nonsynonymous single nucleotide variants (nssnvs) in intrinsically disordered regions
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7755418/
https://www.ncbi.nlm.nih.gov/pubmed/32756939
http://dx.doi.org/10.1093/bioinformatics/btaa618
work_keys_str_mv AT zhoujingbo idrmutpredpredictingdiseaseassociatedgermlinenonsynonymoussinglenucleotidevariantsnssnvsinintrinsicallydisorderedregions
AT xiongyao idrmutpredpredictingdiseaseassociatedgermlinenonsynonymoussinglenucleotidevariantsnssnvsinintrinsicallydisorderedregions
AT anke idrmutpredpredictingdiseaseassociatedgermlinenonsynonymoussinglenucleotidevariantsnssnvsinintrinsicallydisorderedregions
AT yezhiqiang idrmutpredpredictingdiseaseassociatedgermlinenonsynonymoussinglenucleotidevariantsnssnvsinintrinsicallydisorderedregions
AT wuyundong idrmutpredpredictingdiseaseassociatedgermlinenonsynonymoussinglenucleotidevariantsnssnvsinintrinsicallydisorderedregions