Cargando…

A thesaurus of genetic variation for interrogation of repetitive genomic regions

Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity...

Descripción completa

Detalles Bibliográficos
Autores principales: Kerzendorfer, Claudia, Konopka, Tomasz, Nijman, Sebastian M.B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446415/
https://www.ncbi.nlm.nih.gov/pubmed/25820428
http://dx.doi.org/10.1093/nar/gkv178
_version_ 1782373419468193792
author Kerzendorfer, Claudia
Konopka, Tomasz
Nijman, Sebastian M.B.
author_facet Kerzendorfer, Claudia
Konopka, Tomasz
Nijman, Sebastian M.B.
author_sort Kerzendorfer, Claudia
collection PubMed
description Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.
format Online
Article
Text
id pubmed-4446415
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44464152015-06-15 A thesaurus of genetic variation for interrogation of repetitive genomic regions Kerzendorfer, Claudia Konopka, Tomasz Nijman, Sebastian M.B. Nucleic Acids Res Methods Online Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome. Oxford University Press 2015-05-26 2015-03-27 /pmc/articles/PMC4446415/ /pubmed/25820428 http://dx.doi.org/10.1093/nar/gkv178 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Kerzendorfer, Claudia
Konopka, Tomasz
Nijman, Sebastian M.B.
A thesaurus of genetic variation for interrogation of repetitive genomic regions
title A thesaurus of genetic variation for interrogation of repetitive genomic regions
title_full A thesaurus of genetic variation for interrogation of repetitive genomic regions
title_fullStr A thesaurus of genetic variation for interrogation of repetitive genomic regions
title_full_unstemmed A thesaurus of genetic variation for interrogation of repetitive genomic regions
title_short A thesaurus of genetic variation for interrogation of repetitive genomic regions
title_sort thesaurus of genetic variation for interrogation of repetitive genomic regions
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446415/
https://www.ncbi.nlm.nih.gov/pubmed/25820428
http://dx.doi.org/10.1093/nar/gkv178
work_keys_str_mv AT kerzendorferclaudia athesaurusofgeneticvariationforinterrogationofrepetitivegenomicregions
AT konopkatomasz athesaurusofgeneticvariationforinterrogationofrepetitivegenomicregions
AT nijmansebastianmb athesaurusofgeneticvariationforinterrogationofrepetitivegenomicregions
AT kerzendorferclaudia thesaurusofgeneticvariationforinterrogationofrepetitivegenomicregions
AT konopkatomasz thesaurusofgeneticvariationforinterrogationofrepetitivegenomicregions
AT nijmansebastianmb thesaurusofgeneticvariationforinterrogationofrepetitivegenomicregions