Cargando…

Statistical Binning for Barcoded Reads Improves Downstream Analyses

Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report errone...

Descripción completa

Detalles Bibliográficos
Autores principales: Shajii, Ariya, Numanagić, Ibrahim, Whelan, Christopher, Berger, Bonnie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6214366/
https://www.ncbi.nlm.nih.gov/pubmed/30138581
http://dx.doi.org/10.1016/j.cels.2018.07.005
_version_ 1783367972605132800
author Shajii, Ariya
Numanagić, Ibrahim
Whelan, Christopher
Berger, Bonnie
author_facet Shajii, Ariya
Numanagić, Ibrahim
Whelan, Christopher
Berger, Bonnie
author_sort Shajii, Ariya
collection PubMed
description Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for bar-coded read alignment, that first probabilistically maps reads to potentially multiple “read clouds” and then within clouds by newly exploiting the nonuniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing.
format Online
Article
Text
id pubmed-6214366
institution National Center for Biotechnology Information
language English
publishDate 2018
record_format MEDLINE/PubMed
spelling pubmed-62143662018-11-02 Statistical Binning for Barcoded Reads Improves Downstream Analyses Shajii, Ariya Numanagić, Ibrahim Whelan, Christopher Berger, Bonnie Cell Syst Article Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for bar-coded read alignment, that first probabilistically maps reads to potentially multiple “read clouds” and then within clouds by newly exploiting the nonuniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing. 2018-08-22 /pmc/articles/PMC6214366/ /pubmed/30138581 http://dx.doi.org/10.1016/j.cels.2018.07.005 Text en This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Shajii, Ariya
Numanagić, Ibrahim
Whelan, Christopher
Berger, Bonnie
Statistical Binning for Barcoded Reads Improves Downstream Analyses
title Statistical Binning for Barcoded Reads Improves Downstream Analyses
title_full Statistical Binning for Barcoded Reads Improves Downstream Analyses
title_fullStr Statistical Binning for Barcoded Reads Improves Downstream Analyses
title_full_unstemmed Statistical Binning for Barcoded Reads Improves Downstream Analyses
title_short Statistical Binning for Barcoded Reads Improves Downstream Analyses
title_sort statistical binning for barcoded reads improves downstream analyses
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6214366/
https://www.ncbi.nlm.nih.gov/pubmed/30138581
http://dx.doi.org/10.1016/j.cels.2018.07.005
work_keys_str_mv AT shajiiariya statisticalbinningforbarcodedreadsimprovesdownstreamanalyses
AT numanagicibrahim statisticalbinningforbarcodedreadsimprovesdownstreamanalyses
AT whelanchristopher statisticalbinningforbarcodedreadsimprovesdownstreamanalyses
AT bergerbonnie statisticalbinningforbarcodedreadsimprovesdownstreamanalyses