Cargando…
Integrating dilution-based sequencing and population genotypes for single individual haplotyping
BACKGROUND: Haplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches. Among such approaches, the importance of single individual haplotyping (SIH), which infers individual haplotypes from sequence fragments, has been increasing with...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4162929/ https://www.ncbi.nlm.nih.gov/pubmed/25167975 http://dx.doi.org/10.1186/1471-2164-15-733 |
_version_ | 1782334726016598016 |
---|---|
author | Matsumoto, Hirotaka Kiryu, Hisanori |
author_facet | Matsumoto, Hirotaka Kiryu, Hisanori |
author_sort | Matsumoto, Hirotaka |
collection | PubMed |
description | BACKGROUND: Haplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches. Among such approaches, the importance of single individual haplotyping (SIH), which infers individual haplotypes from sequence fragments, has been increasing with the advent of novel sequencing techniques, such as dilution-based sequencing. These techniques could produce virtual long read fragments by separating DNA fragments into multiple low-concentration aliquots, sequencing and mapping each aliquot, and merging clustered short reads. Although these experimental techniques are sophisticated, they have the problem of producing chimeric fragments whose left and right parts match different chromosomes. In our previous research, we found that chimeric fragments significantly decrease the accuracy of SIH. Although chimeric fragments can be removed by using haplotypes which are determined from pedigree genotypes, pedigree genotypes are generally not available. The length of reads cluster and heterozygous calls were also used to detect chimeric fragments. Although some chimeric fragments will be removed with these features, considerable number of chimeric fragments will be undetected because of the dispersion of the length and the absence of SNPs in the overlapped regions. For these reasons, a general method to detect and remove chimeric fragments is needed. RESULTS: In this paper, we propose a general method to detect chimeric fragments. The basis of our method is that a chimeric fragment would correspond to an artificial recombinant haplotype and would differ from biological haplotypes. To detect differences from biological haplotypes, we integrated statistical phasing, which is a haplotype inference approach from population genotypes, into our method. We applied our method to two datasets and detected chimeric fragments with high AUC. AUC values of our method are higher than those of just using cluster length and heterozygous calls. We then used multiple SIH algorithm to compare the accuracy of SIH before and after removing the chimeric fragment candidates. The accuracy of assembled haplotypes increased significantly after removing chimeric fragment candidates. CONCLUSIONS: Our method is useful for detecting chimeric fragments and improving SIH accuracy. The Ruby script is available at https://sites.google.com/site/hmatsu1226/software/csp. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-733) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4162929 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41629292014-09-19 Integrating dilution-based sequencing and population genotypes for single individual haplotyping Matsumoto, Hirotaka Kiryu, Hisanori BMC Genomics Research Article BACKGROUND: Haplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches. Among such approaches, the importance of single individual haplotyping (SIH), which infers individual haplotypes from sequence fragments, has been increasing with the advent of novel sequencing techniques, such as dilution-based sequencing. These techniques could produce virtual long read fragments by separating DNA fragments into multiple low-concentration aliquots, sequencing and mapping each aliquot, and merging clustered short reads. Although these experimental techniques are sophisticated, they have the problem of producing chimeric fragments whose left and right parts match different chromosomes. In our previous research, we found that chimeric fragments significantly decrease the accuracy of SIH. Although chimeric fragments can be removed by using haplotypes which are determined from pedigree genotypes, pedigree genotypes are generally not available. The length of reads cluster and heterozygous calls were also used to detect chimeric fragments. Although some chimeric fragments will be removed with these features, considerable number of chimeric fragments will be undetected because of the dispersion of the length and the absence of SNPs in the overlapped regions. For these reasons, a general method to detect and remove chimeric fragments is needed. RESULTS: In this paper, we propose a general method to detect chimeric fragments. The basis of our method is that a chimeric fragment would correspond to an artificial recombinant haplotype and would differ from biological haplotypes. To detect differences from biological haplotypes, we integrated statistical phasing, which is a haplotype inference approach from population genotypes, into our method. We applied our method to two datasets and detected chimeric fragments with high AUC. AUC values of our method are higher than those of just using cluster length and heterozygous calls. We then used multiple SIH algorithm to compare the accuracy of SIH before and after removing the chimeric fragment candidates. The accuracy of assembled haplotypes increased significantly after removing chimeric fragment candidates. CONCLUSIONS: Our method is useful for detecting chimeric fragments and improving SIH accuracy. The Ruby script is available at https://sites.google.com/site/hmatsu1226/software/csp. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-733) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-28 /pmc/articles/PMC4162929/ /pubmed/25167975 http://dx.doi.org/10.1186/1471-2164-15-733 Text en © Matsumoto and Kiryu; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Matsumoto, Hirotaka Kiryu, Hisanori Integrating dilution-based sequencing and population genotypes for single individual haplotyping |
title | Integrating dilution-based sequencing and population genotypes for single individual haplotyping |
title_full | Integrating dilution-based sequencing and population genotypes for single individual haplotyping |
title_fullStr | Integrating dilution-based sequencing and population genotypes for single individual haplotyping |
title_full_unstemmed | Integrating dilution-based sequencing and population genotypes for single individual haplotyping |
title_short | Integrating dilution-based sequencing and population genotypes for single individual haplotyping |
title_sort | integrating dilution-based sequencing and population genotypes for single individual haplotyping |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4162929/ https://www.ncbi.nlm.nih.gov/pubmed/25167975 http://dx.doi.org/10.1186/1471-2164-15-733 |
work_keys_str_mv | AT matsumotohirotaka integratingdilutionbasedsequencingandpopulationgenotypesforsingleindividualhaplotyping AT kiryuhisanori integratingdilutionbasedsequencingandpopulationgenotypesforsingleindividualhaplotyping |