Cargando…

StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants

Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic c...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharo, Andrew G., Hu, Zhiqiang, Sunyaev, Shamil R., Brenner, Steven E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874149/
https://www.ncbi.nlm.nih.gov/pubmed/35032432
http://dx.doi.org/10.1016/j.ajhg.2021.12.007
_version_ 1784657619912228864
author Sharo, Andrew G.
Hu, Zhiqiang
Sunyaev, Shamil R.
Brenner, Steven E.
author_facet Sharo, Andrew G.
Hu, Zhiqiang
Sunyaev, Shamil R.
Brenner, Steven E.
author_sort Sharo, Andrew G.
collection PubMed
description Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic cause of a portion of these unresolved cases. As sequencing methods using long or linked reads become more accessible and SV detection algorithms improve, clinicians and researchers are gaining access to thousands of reliable SVs of unknown disease relevance. Methods to predict the pathogenicity of these SVs are required to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE to distinguish pathogenic SVs from benign SVs that overlap exons. In a random forest classifier, we integrated features that capture gene importance, coding region, conservation, expression, and exon structure. We found that features such as expression and conservation are important but are absent from SV classification guidelines. We leveraged multiple resources to construct a size-matched training set of rare, putatively benign and pathogenic SVs. StrVCTVRE performs accurately across a wide SV size range on independent test sets, which will allow clinicians and researchers to eliminate about half of SVs from consideration while retaining a 90% sensitivity. We anticipate clinicians and researchers will use StrVCTVRE to prioritize SVs in probands where no SV is immediately compelling, empowering deeper investigation into novel SVs to resolve cases and understand new mechanisms of disease. StrVCTVRE runs rapidly and is publicly available.
format Online
Article
Text
id pubmed-8874149
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-88741492022-03-02 StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants Sharo, Andrew G. Hu, Zhiqiang Sunyaev, Shamil R. Brenner, Steven E. Am J Hum Genet Article Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic cause of a portion of these unresolved cases. As sequencing methods using long or linked reads become more accessible and SV detection algorithms improve, clinicians and researchers are gaining access to thousands of reliable SVs of unknown disease relevance. Methods to predict the pathogenicity of these SVs are required to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE to distinguish pathogenic SVs from benign SVs that overlap exons. In a random forest classifier, we integrated features that capture gene importance, coding region, conservation, expression, and exon structure. We found that features such as expression and conservation are important but are absent from SV classification guidelines. We leveraged multiple resources to construct a size-matched training set of rare, putatively benign and pathogenic SVs. StrVCTVRE performs accurately across a wide SV size range on independent test sets, which will allow clinicians and researchers to eliminate about half of SVs from consideration while retaining a 90% sensitivity. We anticipate clinicians and researchers will use StrVCTVRE to prioritize SVs in probands where no SV is immediately compelling, empowering deeper investigation into novel SVs to resolve cases and understand new mechanisms of disease. StrVCTVRE runs rapidly and is publicly available. Elsevier 2022-02-03 2022-01-14 /pmc/articles/PMC8874149/ /pubmed/35032432 http://dx.doi.org/10.1016/j.ajhg.2021.12.007 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sharo, Andrew G.
Hu, Zhiqiang
Sunyaev, Shamil R.
Brenner, Steven E.
StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
title StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
title_full StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
title_fullStr StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
title_full_unstemmed StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
title_short StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
title_sort strvctvre: a supervised learning method to predict the pathogenicity of human genome structural variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8874149/
https://www.ncbi.nlm.nih.gov/pubmed/35032432
http://dx.doi.org/10.1016/j.ajhg.2021.12.007
work_keys_str_mv AT sharoandrewg strvctvreasupervisedlearningmethodtopredictthepathogenicityofhumangenomestructuralvariants
AT huzhiqiang strvctvreasupervisedlearningmethodtopredictthepathogenicityofhumangenomestructuralvariants
AT sunyaevshamilr strvctvreasupervisedlearningmethodtopredictthepathogenicityofhumangenomestructuralvariants
AT brennerstevene strvctvreasupervisedlearningmethodtopredictthepathogenicityofhumangenomestructuralvariants