Cargando…

A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing

The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencin...

Descripción completa

Detalles Bibliográficos
Autores principales: Ting, Chuan-Kang, Lin, Choun-Sea, Chan, Ming-Tsai, Chen, Jian-Wei, Chuang, Sheng-Yu, Huang, Yao-Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115803/
https://www.ncbi.nlm.nih.gov/pubmed/27861560
http://dx.doi.org/10.1371/journal.pone.0166721
_version_ 1782468573524918272
author Ting, Chuan-Kang
Lin, Choun-Sea
Chan, Ming-Tsai
Chen, Jian-Wei
Chuang, Sheng-Yu
Huang, Yao-Ting
author_facet Ting, Chuan-Kang
Lin, Choun-Sea
Chan, Ming-Tsai
Chen, Jian-Wei
Chuang, Sheng-Yu
Huang, Yao-Ting
author_sort Ting, Chuan-Kang
collection PubMed
description The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.
format Online
Article
Text
id pubmed-5115803
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-51158032016-12-08 A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing Ting, Chuan-Kang Lin, Choun-Sea Chan, Ming-Tsai Chen, Jian-Wei Chuang, Sheng-Yu Huang, Yao-Ting PLoS One Research Article The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated. Public Library of Science 2016-11-18 /pmc/articles/PMC5115803/ /pubmed/27861560 http://dx.doi.org/10.1371/journal.pone.0166721 Text en © 2016 Ting et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ting, Chuan-Kang
Lin, Choun-Sea
Chan, Ming-Tsai
Chen, Jian-Wei
Chuang, Sheng-Yu
Huang, Yao-Ting
A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
title A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
title_full A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
title_fullStr A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
title_full_unstemmed A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
title_short A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
title_sort genetic algorithm for diploid genome reconstruction using paired-end sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115803/
https://www.ncbi.nlm.nih.gov/pubmed/27861560
http://dx.doi.org/10.1371/journal.pone.0166721
work_keys_str_mv AT tingchuankang ageneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT linchounsea ageneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT chanmingtsai ageneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT chenjianwei ageneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT chuangshengyu ageneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT huangyaoting ageneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT tingchuankang geneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT linchounsea geneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT chanmingtsai geneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT chenjianwei geneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT chuangshengyu geneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing
AT huangyaoting geneticalgorithmfordiploidgenomereconstructionusingpairedendsequencing