Cargando…

PanDelos: a dictionary-based method for pan-genome content discovery

BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational cos...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bonnici, Vincenzo, Giugno, Rosalba, Manca, Vincenzo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266927/ https://www.ncbi.nlm.nih.gov/pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6

_version_	1783375948781977600
author	Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo
author_facet	Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo
author_sort	Bonnici, Vincenzo
collection	PubMed
description	BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6266927
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62669272018-12-05 PanDelos: a dictionary-based method for pan-genome content discovery Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo BMC Bioinformatics Research BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-30 /pmc/articles/PMC6266927/ /pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo PanDelos: a dictionary-based method for pan-genome content discovery
title	PanDelos: a dictionary-based method for pan-genome content discovery
title_full	PanDelos: a dictionary-based method for pan-genome content discovery
title_fullStr	PanDelos: a dictionary-based method for pan-genome content discovery
title_full_unstemmed	PanDelos: a dictionary-based method for pan-genome content discovery
title_short	PanDelos: a dictionary-based method for pan-genome content discovery
title_sort	pandelos: a dictionary-based method for pan-genome content discovery
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266927/ https://www.ncbi.nlm.nih.gov/pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6
work_keys_str_mv	AT bonnicivincenzo pandelosadictionarybasedmethodforpangenomecontentdiscovery AT giugnorosalba pandelosadictionarybasedmethodforpangenomecontentdiscovery AT mancavincenzo pandelosadictionarybasedmethodforpangenomecontentdiscovery

PanDelos: a dictionary-based method for pan-genome content discovery

Ejemplares similares