Cargando…

Sequana coverage: detection and characterization of genomic variations using running median and mixture models

BACKGROUND: In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of repli...

Descripción completa

Detalles Bibliográficos
Autores principales: Desvillechabrol, Dimitri, Bouchier, Christiane, Kennedy, Sean, Cokelaer, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6275460/
https://www.ncbi.nlm.nih.gov/pubmed/30192951
http://dx.doi.org/10.1093/gigascience/giy110
_version_ 1783377818314342400
author Desvillechabrol, Dimitri
Bouchier, Christiane
Kennedy, Sean
Cokelaer, Thomas
author_facet Desvillechabrol, Dimitri
Bouchier, Christiane
Kennedy, Sean
Cokelaer, Thomas
author_sort Desvillechabrol, Dimitri
collection PubMed
description BACKGROUND: In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of replication), or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise location. RESULTS: We provide a stand-alone application, sequana_coverage, that reports genomic regions of interest (ROIs) that are significantly over- or underrepresented in high-throughput sequencing data. Significance is associated with the events as well as characteristics such as length of the regions. The algorithm first detrends the data using an efficient running median algorithm. It then estimates the distribution of the normalized genome coverage with a Gaussian mixture model. Finally, a z-score statistic is assigned to each base position and used to separate the central distribution from the ROIs (i.e., under- and overcovered regions). A double thresholds mechanism is used to cluster the genomic ROIs. HTML reports provide a summary with interactive visual representations of the genomic ROIs with standard plots and metrics. Genomic variations such as single-nucleotide variants or CNVs can be effectively identified at the same time.
format Online
Article
Text
id pubmed-6275460
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-62754602018-12-06 Sequana coverage: detection and characterization of genomic variations using running median and mixture models Desvillechabrol, Dimitri Bouchier, Christiane Kennedy, Sean Cokelaer, Thomas Gigascience Technical Note BACKGROUND: In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of replication), or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise location. RESULTS: We provide a stand-alone application, sequana_coverage, that reports genomic regions of interest (ROIs) that are significantly over- or underrepresented in high-throughput sequencing data. Significance is associated with the events as well as characteristics such as length of the regions. The algorithm first detrends the data using an efficient running median algorithm. It then estimates the distribution of the normalized genome coverage with a Gaussian mixture model. Finally, a z-score statistic is assigned to each base position and used to separate the central distribution from the ROIs (i.e., under- and overcovered regions). A double thresholds mechanism is used to cluster the genomic ROIs. HTML reports provide a summary with interactive visual representations of the genomic ROIs with standard plots and metrics. Genomic variations such as single-nucleotide variants or CNVs can be effectively identified at the same time. Oxford University Press 2018-09-06 /pmc/articles/PMC6275460/ /pubmed/30192951 http://dx.doi.org/10.1093/gigascience/giy110 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Desvillechabrol, Dimitri
Bouchier, Christiane
Kennedy, Sean
Cokelaer, Thomas
Sequana coverage: detection and characterization of genomic variations using running median and mixture models
title Sequana coverage: detection and characterization of genomic variations using running median and mixture models
title_full Sequana coverage: detection and characterization of genomic variations using running median and mixture models
title_fullStr Sequana coverage: detection and characterization of genomic variations using running median and mixture models
title_full_unstemmed Sequana coverage: detection and characterization of genomic variations using running median and mixture models
title_short Sequana coverage: detection and characterization of genomic variations using running median and mixture models
title_sort sequana coverage: detection and characterization of genomic variations using running median and mixture models
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6275460/
https://www.ncbi.nlm.nih.gov/pubmed/30192951
http://dx.doi.org/10.1093/gigascience/giy110
work_keys_str_mv AT desvillechabroldimitri sequanacoveragedetectionandcharacterizationofgenomicvariationsusingrunningmedianandmixturemodels
AT bouchierchristiane sequanacoveragedetectionandcharacterizationofgenomicvariationsusingrunningmedianandmixturemodels
AT kennedysean sequanacoveragedetectionandcharacterizationofgenomicvariationsusingrunningmedianandmixturemodels
AT cokelaerthomas sequanacoveragedetectionandcharacterizationofgenomicvariationsusingrunningmedianandmixturemodels