Cargando…

Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score

Motivation: Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, howev...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Hayan, Schatz, Michael C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413383/
https://www.ncbi.nlm.nih.gov/pubmed/22668792
http://dx.doi.org/10.1093/bioinformatics/bts330
_version_ 1782240052368113664
author Lee, Hayan
Schatz, Michael C.
author_facet Lee, Hayan
Schatz, Michael C.
author_sort Lee, Hayan
collection PubMed
description Motivation: Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. Results: We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5–14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the ‘dark matter’ of the genome, including of known clinically relevant variations in these regions. Availability: The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net Contact: hlee@cshl.edu Supplementary Information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3413383
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34133832012-08-07 Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score Lee, Hayan Schatz, Michael C. Bioinformatics Original Paper Motivation: Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. Results: We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5–14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the ‘dark matter’ of the genome, including of known clinically relevant variations in these regions. Availability: The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net Contact: hlee@cshl.edu Supplementary Information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-08-15 2012-07-04 /pmc/articles/PMC3413383/ /pubmed/22668792 http://dx.doi.org/10.1093/bioinformatics/bts330 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Lee, Hayan
Schatz, Michael C.
Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
title Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
title_full Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
title_fullStr Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
title_full_unstemmed Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
title_short Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
title_sort genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413383/
https://www.ncbi.nlm.nih.gov/pubmed/22668792
http://dx.doi.org/10.1093/bioinformatics/bts330
work_keys_str_mv AT leehayan genomicdarkmatterthereliabilityofshortreadmappingillustratedbythegenomemappabilityscore
AT schatzmichaelc genomicdarkmatterthereliabilityofshortreadmappingillustratedbythegenomemappabilityscore