Cargando…

Evaluating genome architecture of a complex region via generalized bipartite matching

With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Lo, Christine, Kim, Sangwoo, Zakov, Shay, Bafna, Vineet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622632/
https://www.ncbi.nlm.nih.gov/pubmed/23734567
http://dx.doi.org/10.1186/1471-2105-14-S5-S13
_version_ 1782265857587544064
author Lo, Christine
Kim, Sangwoo
Zakov, Shay
Bafna, Vineet
author_facet Lo, Christine
Kim, Sangwoo
Zakov, Shay
Bafna, Vineet
author_sort Lo, Christine
collection PubMed
description With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions. In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called Coverage Sensitive many-to-many min-cost bipartite Matching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (Scoring function for Assembled GEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19.
format Online
Article
Text
id pubmed-3622632
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36226322013-04-15 Evaluating genome architecture of a complex region via generalized bipartite matching Lo, Christine Kim, Sangwoo Zakov, Shay Bafna, Vineet BMC Bioinformatics Proceedings With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions. In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called Coverage Sensitive many-to-many min-cost bipartite Matching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (Scoring function for Assembled GEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19. BioMed Central 2013-04-10 /pmc/articles/PMC3622632/ /pubmed/23734567 http://dx.doi.org/10.1186/1471-2105-14-S5-S13 Text en Copyright © 2013 Lo et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lo, Christine
Kim, Sangwoo
Zakov, Shay
Bafna, Vineet
Evaluating genome architecture of a complex region via generalized bipartite matching
title Evaluating genome architecture of a complex region via generalized bipartite matching
title_full Evaluating genome architecture of a complex region via generalized bipartite matching
title_fullStr Evaluating genome architecture of a complex region via generalized bipartite matching
title_full_unstemmed Evaluating genome architecture of a complex region via generalized bipartite matching
title_short Evaluating genome architecture of a complex region via generalized bipartite matching
title_sort evaluating genome architecture of a complex region via generalized bipartite matching
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622632/
https://www.ncbi.nlm.nih.gov/pubmed/23734567
http://dx.doi.org/10.1186/1471-2105-14-S5-S13
work_keys_str_mv AT lochristine evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching
AT kimsangwoo evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching
AT zakovshay evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching
AT bafnavineet evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching