Cargando…

BlackOPs: increasing confidence in variant detection through mappability filtering

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘...

Descripción completa

Detalles Bibliográficos
Autores principales: Cabanski, Christopher R., Wilkerson, Matthew D., Soloway, Matthew, Parker, Joel S., Liu, Jinze, Prins, Jan F., Marron, J. S., Perou, Charles M., Hayes, D. Neil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3799449/
https://www.ncbi.nlm.nih.gov/pubmed/23935067
http://dx.doi.org/10.1093/nar/gkt692
_version_ 1782287870686396416
author Cabanski, Christopher R.
Wilkerson, Matthew D.
Soloway, Matthew
Parker, Joel S.
Liu, Jinze
Prins, Jan F.
Marron, J. S.
Perou, Charles M.
Hayes, D. Neil
author_facet Cabanski, Christopher R.
Wilkerson, Matthew D.
Soloway, Matthew
Parker, Joel S.
Liu, Jinze
Prins, Jan F.
Marron, J. S.
Perou, Charles M.
Hayes, D. Neil
author_sort Cabanski, Christopher R.
collection PubMed
description Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
format Online
Article
Text
id pubmed-3799449
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37994492013-10-21 BlackOPs: increasing confidence in variant detection through mappability filtering Cabanski, Christopher R. Wilkerson, Matthew D. Soloway, Matthew Parker, Joel S. Liu, Jinze Prins, Jan F. Marron, J. S. Perou, Charles M. Hayes, D. Neil Nucleic Acids Res Methods Online Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing. Oxford University Press 2013-10 2013-08-08 /pmc/articles/PMC3799449/ /pubmed/23935067 http://dx.doi.org/10.1093/nar/gkt692 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Cabanski, Christopher R.
Wilkerson, Matthew D.
Soloway, Matthew
Parker, Joel S.
Liu, Jinze
Prins, Jan F.
Marron, J. S.
Perou, Charles M.
Hayes, D. Neil
BlackOPs: increasing confidence in variant detection through mappability filtering
title BlackOPs: increasing confidence in variant detection through mappability filtering
title_full BlackOPs: increasing confidence in variant detection through mappability filtering
title_fullStr BlackOPs: increasing confidence in variant detection through mappability filtering
title_full_unstemmed BlackOPs: increasing confidence in variant detection through mappability filtering
title_short BlackOPs: increasing confidence in variant detection through mappability filtering
title_sort blackops: increasing confidence in variant detection through mappability filtering
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3799449/
https://www.ncbi.nlm.nih.gov/pubmed/23935067
http://dx.doi.org/10.1093/nar/gkt692
work_keys_str_mv AT cabanskichristopherr blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT wilkersonmatthewd blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT solowaymatthew blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT parkerjoels blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT liujinze blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT prinsjanf blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT marronjs blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT peroucharlesm blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering
AT hayesdneil blackopsincreasingconfidenceinvariantdetectionthroughmappabilityfiltering