Cargando…

Strobe sequence design for haplotype assembly

BACKGROUND: Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and d...

Descripción completa

Detalles Bibliográficos
Autores principales: Lo, Christine, Bashir, Ali, Bansal, Vikas, Bafna, Vineet
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044279/
https://www.ncbi.nlm.nih.gov/pubmed/21342554
http://dx.doi.org/10.1186/1471-2105-12-S1-S24
_version_ 1782198709541404672
author Lo, Christine
Bashir, Ali
Bansal, Vikas
Bafna, Vineet
author_facet Lo, Christine
Bashir, Ali
Bansal, Vikas
Bafna, Vineet
author_sort Lo, Christine
collection PubMed
description BACKGROUND: Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. RESULTS: We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. CONCLUSIONS: Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.
format Text
id pubmed-3044279
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30442792011-02-25 Strobe sequence design for haplotype assembly Lo, Christine Bashir, Ali Bansal, Vikas Bafna, Vineet BMC Bioinformatics Research BACKGROUND: Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. RESULTS: We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. CONCLUSIONS: Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies. BioMed Central 2011-02-15 /pmc/articles/PMC3044279/ /pubmed/21342554 http://dx.doi.org/10.1186/1471-2105-12-S1-S24 Text en Copyright ©2011 Lo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Lo, Christine
Bashir, Ali
Bansal, Vikas
Bafna, Vineet
Strobe sequence design for haplotype assembly
title Strobe sequence design for haplotype assembly
title_full Strobe sequence design for haplotype assembly
title_fullStr Strobe sequence design for haplotype assembly
title_full_unstemmed Strobe sequence design for haplotype assembly
title_short Strobe sequence design for haplotype assembly
title_sort strobe sequence design for haplotype assembly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044279/
https://www.ncbi.nlm.nih.gov/pubmed/21342554
http://dx.doi.org/10.1186/1471-2105-12-S1-S24
work_keys_str_mv AT lochristine strobesequencedesignforhaplotypeassembly
AT bashirali strobesequencedesignforhaplotypeassembly
AT bansalvikas strobesequencedesignforhaplotypeassembly
AT bafnavineet strobesequencedesignforhaplotypeassembly