Cargando…

Statistical aspects of discerning indel-type structural variation via DNA sequence alignment

BACKGROUND: Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Wendl, Michael C, Wilson, Richard K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2748092/
https://www.ncbi.nlm.nih.gov/pubmed/19656394
http://dx.doi.org/10.1186/1471-2164-10-359
_version_ 1782172131996467200
author Wendl, Michael C
Wilson, Richard K
author_facet Wendl, Michael C
Wilson, Richard K
author_sort Wendl, Michael C
collection PubMed
description BACKGROUND: Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for. RESULTS: Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs. CONCLUSION: The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk.
format Text
id pubmed-2748092
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27480922009-09-22 Statistical aspects of discerning indel-type structural variation via DNA sequence alignment Wendl, Michael C Wilson, Richard K BMC Genomics Research Article BACKGROUND: Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for. RESULTS: Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs. CONCLUSION: The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk. BioMed Central 2009-08-05 /pmc/articles/PMC2748092/ /pubmed/19656394 http://dx.doi.org/10.1186/1471-2164-10-359 Text en Copyright © 2009 Wendl and Wilson; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wendl, Michael C
Wilson, Richard K
Statistical aspects of discerning indel-type structural variation via DNA sequence alignment
title Statistical aspects of discerning indel-type structural variation via DNA sequence alignment
title_full Statistical aspects of discerning indel-type structural variation via DNA sequence alignment
title_fullStr Statistical aspects of discerning indel-type structural variation via DNA sequence alignment
title_full_unstemmed Statistical aspects of discerning indel-type structural variation via DNA sequence alignment
title_short Statistical aspects of discerning indel-type structural variation via DNA sequence alignment
title_sort statistical aspects of discerning indel-type structural variation via dna sequence alignment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2748092/
https://www.ncbi.nlm.nih.gov/pubmed/19656394
http://dx.doi.org/10.1186/1471-2164-10-359
work_keys_str_mv AT wendlmichaelc statisticalaspectsofdiscerningindeltypestructuralvariationviadnasequencealignment
AT wilsonrichardk statisticalaspectsofdiscerningindeltypestructuralvariationviadnasequencealignment