Cargando…
Evaluating genome architecture of a complex region via generalized bipartite matching
With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is ba...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622632/ https://www.ncbi.nlm.nih.gov/pubmed/23734567 http://dx.doi.org/10.1186/1471-2105-14-S5-S13 |
_version_ | 1782265857587544064 |
---|---|
author | Lo, Christine Kim, Sangwoo Zakov, Shay Bafna, Vineet |
author_facet | Lo, Christine Kim, Sangwoo Zakov, Shay Bafna, Vineet |
author_sort | Lo, Christine |
collection | PubMed |
description | With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions. In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called Coverage Sensitive many-to-many min-cost bipartite Matching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (Scoring function for Assembled GEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19. |
format | Online Article Text |
id | pubmed-3622632 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36226322013-04-15 Evaluating genome architecture of a complex region via generalized bipartite matching Lo, Christine Kim, Sangwoo Zakov, Shay Bafna, Vineet BMC Bioinformatics Proceedings With the remarkable development in inexpensive sequencing technologies and supporting computational tools, we have the promise of medicine being personalized by knowledge of the individual genome. Current technologies provide high throughput, but short reads. Reconstruction of the donor genome is based either on de novo assembly of the (short) reads, or on mapping donor reads to a standard reference. While such techniques demonstrate high success rates for inferring 'simple' genomic segments, they are confounded by segments with complex duplication patterns, including regions of direct medical relevance, like the HLA and the KIR regions. In this work, we address this problem with a method for assessing the quality of a predicted genome sequence for complex regions of the genome. This method combines two natural types of evidence: sequence similarity of the mapped reads to the predicted donor genome, and distribution of reads across the predicted genome. We define a new scoring function for read-to-genome matchings, which penalizes for sequence dissimilarities and deviations from expected read location distribution, and present an efficient algorithm for finding matchings that minimize the penalty. The algorithm is based on a formal problem, first defined in this paper, called Coverage Sensitive many-to-many min-cost bipartite Matching (CSM). This new problem variant generalizes the standard (one-to-one) weighted bipartite matching problem, and can be solved using network flows. The resulting Java-based tool, called SAGE (Scoring function for Assembled GEnomes), is freely available upon request. We demonstrate over simulated data that SAGE can be used to infer correct haplotypes of the highly repetitive KIR region on the Human chromosome 19. BioMed Central 2013-04-10 /pmc/articles/PMC3622632/ /pubmed/23734567 http://dx.doi.org/10.1186/1471-2105-14-S5-S13 Text en Copyright © 2013 Lo et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Lo, Christine Kim, Sangwoo Zakov, Shay Bafna, Vineet Evaluating genome architecture of a complex region via generalized bipartite matching |
title | Evaluating genome architecture of a complex region via generalized bipartite matching |
title_full | Evaluating genome architecture of a complex region via generalized bipartite matching |
title_fullStr | Evaluating genome architecture of a complex region via generalized bipartite matching |
title_full_unstemmed | Evaluating genome architecture of a complex region via generalized bipartite matching |
title_short | Evaluating genome architecture of a complex region via generalized bipartite matching |
title_sort | evaluating genome architecture of a complex region via generalized bipartite matching |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622632/ https://www.ncbi.nlm.nih.gov/pubmed/23734567 http://dx.doi.org/10.1186/1471-2105-14-S5-S13 |
work_keys_str_mv | AT lochristine evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching AT kimsangwoo evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching AT zakovshay evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching AT bafnavineet evaluatinggenomearchitectureofacomplexregionviageneralizedbipartitematching |