Cargando…
A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data
BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679970/ https://www.ncbi.nlm.nih.gov/pubmed/23442253 http://dx.doi.org/10.1186/1471-2164-14-128 |
_version_ | 1782273048042274816 |
---|---|
author | Sepúlveda, Nuno Campino, Susana G Assefa, Samuel A Sutherland, Colin J Pain5, Arnab Clark, Taane G |
author_facet | Sepúlveda, Nuno Campino, Susana G Assefa, Samuel A Sutherland, Colin J Pain5, Arnab Clark, Taane G |
author_sort | Sepúlveda, Nuno |
collection | PubMed |
description | BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. RESULTS: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. CONCLUSIONS: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. |
format | Online Article Text |
id | pubmed-3679970 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36799702013-06-25 A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data Sepúlveda, Nuno Campino, Susana G Assefa, Samuel A Sutherland, Colin J Pain5, Arnab Clark, Taane G BMC Genomics Methodology Article BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. RESULTS: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. CONCLUSIONS: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. BioMed Central 2013-02-26 /pmc/articles/PMC3679970/ /pubmed/23442253 http://dx.doi.org/10.1186/1471-2164-14-128 Text en Copyright © 2013 Sepúlveda et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Sepúlveda, Nuno Campino, Susana G Assefa, Samuel A Sutherland, Colin J Pain5, Arnab Clark, Taane G A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
title | A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
title_full | A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
title_fullStr | A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
title_full_unstemmed | A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
title_short | A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
title_sort | poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679970/ https://www.ncbi.nlm.nih.gov/pubmed/23442253 http://dx.doi.org/10.1186/1471-2164-14-128 |
work_keys_str_mv | AT sepulvedanuno apoissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT campinosusanag apoissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT assefasamuela apoissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT sutherlandcolinj apoissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT pain5arnab apoissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT clarktaaneg apoissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT sepulvedanuno poissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT campinosusanag poissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT assefasamuela poissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT sutherlandcolinj poissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT pain5arnab poissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata AT clarktaaneg poissonhierarchicalmodellingapproachtodetectingcopynumbervariationinsequencecoveragedata |