Cargando…
OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis
BACKGROUND: Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologie...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719737/ https://www.ncbi.nlm.nih.gov/pubmed/26793302 http://dx.doi.org/10.1186/s13742-016-0110-0 |
_version_ | 1782410968504991744 |
---|---|
author | Verzotto, Davide M. Teo, Audrey S. Hillmer, Axel M. Nagarajan, Niranjan |
author_facet | Verzotto, Davide M. Teo, Audrey S. Hillmer, Axel M. Nagarajan, Niranjan |
author_sort | Verzotto, Davide |
collection | PubMed |
description | BACKGROUND: Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. RESULTS: We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. CONCLUSIONS: We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6−2 times more sensitive) and are more efficient (170−200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0110-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4719737 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47197372016-01-21 OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis Verzotto, Davide M. Teo, Audrey S. Hillmer, Axel M. Nagarajan, Niranjan Gigascience Research BACKGROUND: Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. RESULTS: We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. CONCLUSIONS: We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6−2 times more sensitive) and are more efficient (170−200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0110-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-19 /pmc/articles/PMC4719737/ /pubmed/26793302 http://dx.doi.org/10.1186/s13742-016-0110-0 Text en © Verzotto et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Verzotto, Davide M. Teo, Audrey S. Hillmer, Axel M. Nagarajan, Niranjan OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
title | OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
title_full | OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
title_fullStr | OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
title_full_unstemmed | OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
title_short | OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
title_sort | optima: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719737/ https://www.ncbi.nlm.nih.gov/pubmed/26793302 http://dx.doi.org/10.1186/s13742-016-0110-0 |
work_keys_str_mv | AT verzottodavide optimasensitiveandaccuratewholegenomealignmentoferrorpronegenomicmapsbycombinatorialindexingandtechnologyagnosticstatisticalanalysis AT mteoaudreys optimasensitiveandaccuratewholegenomealignmentoferrorpronegenomicmapsbycombinatorialindexingandtechnologyagnosticstatisticalanalysis AT hillmeraxelm optimasensitiveandaccuratewholegenomealignmentoferrorpronegenomicmapsbycombinatorialindexingandtechnologyagnosticstatisticalanalysis AT nagarajanniranjan optimasensitiveandaccuratewholegenomealignmentoferrorpronegenomicmapsbycombinatorialindexingandtechnologyagnosticstatisticalanalysis |