Cargando…
A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data
BACKGROUND: Exome sequencing (ES) is a first-tier diagnostic test for many suspected Mendelian disorders. While it is routine to detect small sequence variants, it is not a standard practice in clinical settings to detect germline copy-number variants (CNVs) from ES data due to several reasons relat...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6993336/ https://www.ncbi.nlm.nih.gov/pubmed/32000839 http://dx.doi.org/10.1186/s13073-020-0712-0 |
_version_ | 1783493010012504064 |
---|---|
author | Rajagopalan, Ramakrishnan Murrell, Jill R. Luo, Minjie Conlin, Laura K. |
author_facet | Rajagopalan, Ramakrishnan Murrell, Jill R. Luo, Minjie Conlin, Laura K. |
author_sort | Rajagopalan, Ramakrishnan |
collection | PubMed |
description | BACKGROUND: Exome sequencing (ES) is a first-tier diagnostic test for many suspected Mendelian disorders. While it is routine to detect small sequence variants, it is not a standard practice in clinical settings to detect germline copy-number variants (CNVs) from ES data due to several reasons relating to performance. In this work, we comprehensively characterized one of the most sensitive ES-based CNV tools, ExomeDepth, against SNP array, a standard of care test in clinical settings to detect genome-wide CNVs. METHODS: We propose a modified ExomeDepth workflow by excluding exons with low mappability prior to variant calling to drastically reduce the false positives originating from the repetitive regions of the genome, and an iterative variant calling framework to assess the reproducibility. We used a cohort of 307 individuals with clinical ES data and clinical SNP array to estimate the sensitivity and false discovery rate of the CNV detection using exome sequencing. Further, we performed targeted testing of the STRC gene in 1972 individuals. To reduce the number of variants for downstream analysis, we performed a large-scale iterative variant calling process with random control cohorts to assess the reproducibility of the CNVs. RESULTS: The modified workflow presented in this paper reduced the number of total variants identified by one third while retaining a higher sensitivity of 97% and resulted in an improved false discovery rate of 11.4% compared to the default ExomeDepth pipeline. The exclusion of exons with low mappability removes 4.5% of the exons, including a subset of exons (0.6%) in disease-associated genes which are intractable by short-read next-generation sequencing (NGS). Results from the reproducibility analysis showed that the clinically reported variants were reproducible 100% of the time and that the modified workflow can be used to rank variants from high to low confidence. Targeted testing of 30 CNVs identified in STRC, a challenging gene to ascertain by NGS, showed a 100% validation rate. CONCLUSIONS: In summary, we introduced a modification to the default ExomeDepth workflow to reduce the false positives originating from the repetitive regions of the genome, created a large-scale iterative variant calling framework for reproducibility, and provided recommendations for implementation in clinical settings. |
format | Online Article Text |
id | pubmed-6993336 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69933362020-02-04 A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data Rajagopalan, Ramakrishnan Murrell, Jill R. Luo, Minjie Conlin, Laura K. Genome Med Research BACKGROUND: Exome sequencing (ES) is a first-tier diagnostic test for many suspected Mendelian disorders. While it is routine to detect small sequence variants, it is not a standard practice in clinical settings to detect germline copy-number variants (CNVs) from ES data due to several reasons relating to performance. In this work, we comprehensively characterized one of the most sensitive ES-based CNV tools, ExomeDepth, against SNP array, a standard of care test in clinical settings to detect genome-wide CNVs. METHODS: We propose a modified ExomeDepth workflow by excluding exons with low mappability prior to variant calling to drastically reduce the false positives originating from the repetitive regions of the genome, and an iterative variant calling framework to assess the reproducibility. We used a cohort of 307 individuals with clinical ES data and clinical SNP array to estimate the sensitivity and false discovery rate of the CNV detection using exome sequencing. Further, we performed targeted testing of the STRC gene in 1972 individuals. To reduce the number of variants for downstream analysis, we performed a large-scale iterative variant calling process with random control cohorts to assess the reproducibility of the CNVs. RESULTS: The modified workflow presented in this paper reduced the number of total variants identified by one third while retaining a higher sensitivity of 97% and resulted in an improved false discovery rate of 11.4% compared to the default ExomeDepth pipeline. The exclusion of exons with low mappability removes 4.5% of the exons, including a subset of exons (0.6%) in disease-associated genes which are intractable by short-read next-generation sequencing (NGS). Results from the reproducibility analysis showed that the clinically reported variants were reproducible 100% of the time and that the modified workflow can be used to rank variants from high to low confidence. Targeted testing of 30 CNVs identified in STRC, a challenging gene to ascertain by NGS, showed a 100% validation rate. CONCLUSIONS: In summary, we introduced a modification to the default ExomeDepth workflow to reduce the false positives originating from the repetitive regions of the genome, created a large-scale iterative variant calling framework for reproducibility, and provided recommendations for implementation in clinical settings. BioMed Central 2020-01-30 /pmc/articles/PMC6993336/ /pubmed/32000839 http://dx.doi.org/10.1186/s13073-020-0712-0 Text en © The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Rajagopalan, Ramakrishnan Murrell, Jill R. Luo, Minjie Conlin, Laura K. A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
title | A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
title_full | A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
title_fullStr | A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
title_full_unstemmed | A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
title_short | A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
title_sort | highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6993336/ https://www.ncbi.nlm.nih.gov/pubmed/32000839 http://dx.doi.org/10.1186/s13073-020-0712-0 |
work_keys_str_mv | AT rajagopalanramakrishnan ahighlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT murrelljillr ahighlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT luominjie ahighlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT conlinlaurak ahighlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT rajagopalanramakrishnan highlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT murrelljillr highlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT luominjie highlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata AT conlinlaurak highlysensitiveandspecificworkflowfordetectingrarecopynumbervariantsfromexomesequencingdata |