Cargando…
Statistical Binning for Barcoded Reads Improves Downstream Analyses
Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report errone...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6214366/ https://www.ncbi.nlm.nih.gov/pubmed/30138581 http://dx.doi.org/10.1016/j.cels.2018.07.005 |
_version_ | 1783367972605132800 |
---|---|
author | Shajii, Ariya Numanagić, Ibrahim Whelan, Christopher Berger, Bonnie |
author_facet | Shajii, Ariya Numanagić, Ibrahim Whelan, Christopher Berger, Bonnie |
author_sort | Shajii, Ariya |
collection | PubMed |
description | Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for bar-coded read alignment, that first probabilistically maps reads to potentially multiple “read clouds” and then within clouds by newly exploiting the nonuniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing. |
format | Online Article Text |
id | pubmed-6214366 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
record_format | MEDLINE/PubMed |
spelling | pubmed-62143662018-11-02 Statistical Binning for Barcoded Reads Improves Downstream Analyses Shajii, Ariya Numanagić, Ibrahim Whelan, Christopher Berger, Bonnie Cell Syst Article Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for bar-coded read alignment, that first probabilistically maps reads to potentially multiple “read clouds” and then within clouds by newly exploiting the nonuniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing. 2018-08-22 /pmc/articles/PMC6214366/ /pubmed/30138581 http://dx.doi.org/10.1016/j.cels.2018.07.005 Text en This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Shajii, Ariya Numanagić, Ibrahim Whelan, Christopher Berger, Bonnie Statistical Binning for Barcoded Reads Improves Downstream Analyses |
title | Statistical Binning for Barcoded Reads Improves Downstream Analyses |
title_full | Statistical Binning for Barcoded Reads Improves Downstream Analyses |
title_fullStr | Statistical Binning for Barcoded Reads Improves Downstream Analyses |
title_full_unstemmed | Statistical Binning for Barcoded Reads Improves Downstream Analyses |
title_short | Statistical Binning for Barcoded Reads Improves Downstream Analyses |
title_sort | statistical binning for barcoded reads improves downstream analyses |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6214366/ https://www.ncbi.nlm.nih.gov/pubmed/30138581 http://dx.doi.org/10.1016/j.cels.2018.07.005 |
work_keys_str_mv | AT shajiiariya statisticalbinningforbarcodedreadsimprovesdownstreamanalyses AT numanagicibrahim statisticalbinningforbarcodedreadsimprovesdownstreamanalyses AT whelanchristopher statisticalbinningforbarcodedreadsimprovesdownstreamanalyses AT bergerbonnie statisticalbinningforbarcodedreadsimprovesdownstreamanalyses |