Cargando…

GenHap: a novel computational method based on genetic algorithms for haplotype assembly

BACKGROUND: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype asse...

Descripción completa

Detalles Bibliográficos
Autores principales: Tangherloni, Andrea, Spolaor, Simone, Rundo, Leonardo, Nobile, Marco S., Cazzaniga, Paolo, Mauri, Giancarlo, Liò, Pietro, Merelli, Ivan, Besozzi, Daniela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6471693/
https://www.ncbi.nlm.nih.gov/pubmed/30999845
http://dx.doi.org/10.1186/s12859-019-2691-y
_version_ 1783412082632294400
author Tangherloni, Andrea
Spolaor, Simone
Rundo, Leonardo
Nobile, Marco S.
Cazzaniga, Paolo
Mauri, Giancarlo
Liò, Pietro
Merelli, Ivan
Besozzi, Daniela
author_facet Tangherloni, Andrea
Spolaor, Simone
Rundo, Leonardo
Nobile, Marco S.
Cazzaniga, Paolo
Mauri, Giancarlo
Liò, Pietro
Merelli, Ivan
Besozzi, Daniela
author_sort Tangherloni, Andrea
collection PubMed
description BACKGROUND: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. RESULTS: To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. CONCLUSIONS: Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap.
format Online
Article
Text
id pubmed-6471693
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64716932019-04-24 GenHap: a novel computational method based on genetic algorithms for haplotype assembly Tangherloni, Andrea Spolaor, Simone Rundo, Leonardo Nobile, Marco S. Cazzaniga, Paolo Mauri, Giancarlo Liò, Pietro Merelli, Ivan Besozzi, Daniela BMC Bioinformatics Research BACKGROUND: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. RESULTS: To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. CONCLUSIONS: Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap. BioMed Central 2019-04-18 /pmc/articles/PMC6471693/ /pubmed/30999845 http://dx.doi.org/10.1186/s12859-019-2691-y Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Tangherloni, Andrea
Spolaor, Simone
Rundo, Leonardo
Nobile, Marco S.
Cazzaniga, Paolo
Mauri, Giancarlo
Liò, Pietro
Merelli, Ivan
Besozzi, Daniela
GenHap: a novel computational method based on genetic algorithms for haplotype assembly
title GenHap: a novel computational method based on genetic algorithms for haplotype assembly
title_full GenHap: a novel computational method based on genetic algorithms for haplotype assembly
title_fullStr GenHap: a novel computational method based on genetic algorithms for haplotype assembly
title_full_unstemmed GenHap: a novel computational method based on genetic algorithms for haplotype assembly
title_short GenHap: a novel computational method based on genetic algorithms for haplotype assembly
title_sort genhap: a novel computational method based on genetic algorithms for haplotype assembly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6471693/
https://www.ncbi.nlm.nih.gov/pubmed/30999845
http://dx.doi.org/10.1186/s12859-019-2691-y
work_keys_str_mv AT tangherloniandrea genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT spolaorsimone genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT rundoleonardo genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT nobilemarcos genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT cazzanigapaolo genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT maurigiancarlo genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT liopietro genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT merelliivan genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly
AT besozzidaniela genhapanovelcomputationalmethodbasedongeneticalgorithmsforhaplotypeassembly