Cargando…

Assessing the reproducibility of exome copy number variations predictions

BACKGROUND: Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Celine S., Singh, Larry N., Mullikin, James C., Biesecker, Leslie G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4976506/
https://www.ncbi.nlm.nih.gov/pubmed/27503473
http://dx.doi.org/10.1186/s13073-016-0336-6
_version_ 1782446881548271616
author Hong, Celine S.
Singh, Larry N.
Mullikin, James C.
Biesecker, Leslie G.
author_facet Hong, Celine S.
Singh, Larry N.
Mullikin, James C.
Biesecker, Leslie G.
author_sort Hong, Celine S.
collection PubMed
description BACKGROUND: Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology. METHODS: Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array. RESULTS: Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = –0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency. CONCLUSIONS: Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-016-0336-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4976506
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49765062016-08-09 Assessing the reproducibility of exome copy number variations predictions Hong, Celine S. Singh, Larry N. Mullikin, James C. Biesecker, Leslie G. Genome Med Research BACKGROUND: Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology. METHODS: Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array. RESULTS: Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = –0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency. CONCLUSIONS: Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-016-0336-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-08 /pmc/articles/PMC4976506/ /pubmed/27503473 http://dx.doi.org/10.1186/s13073-016-0336-6 Text en © COPYRIGHT NOTICE. 2016 The article is a work of the United States Government; Title 17 U.S.C 105 provides that copyright protection is not available for any work of the United States government in the United States. Additionally, this is an open access article distributed under the terms of the Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0), which permits worldwide unrestricted use, distribution, and reproduction in any medium for any lawful purpose.
spellingShingle Research
Hong, Celine S.
Singh, Larry N.
Mullikin, James C.
Biesecker, Leslie G.
Assessing the reproducibility of exome copy number variations predictions
title Assessing the reproducibility of exome copy number variations predictions
title_full Assessing the reproducibility of exome copy number variations predictions
title_fullStr Assessing the reproducibility of exome copy number variations predictions
title_full_unstemmed Assessing the reproducibility of exome copy number variations predictions
title_short Assessing the reproducibility of exome copy number variations predictions
title_sort assessing the reproducibility of exome copy number variations predictions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4976506/
https://www.ncbi.nlm.nih.gov/pubmed/27503473
http://dx.doi.org/10.1186/s13073-016-0336-6
work_keys_str_mv AT hongcelines assessingthereproducibilityofexomecopynumbervariationspredictions
AT singhlarryn assessingthereproducibilityofexomecopynumbervariationspredictions
AT mullikinjamesc assessingthereproducibilityofexomecopynumbervariationspredictions
AT bieseckerleslieg assessingthereproducibilityofexomecopynumbervariationspredictions