Cargando…

PanDelos: a dictionary-based method for pan-genome content discovery

BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational cos...

Descripción completa

Detalles Bibliográficos
Autores principales: Bonnici, Vincenzo, Giugno, Rosalba, Manca, Vincenzo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266927/
https://www.ncbi.nlm.nih.gov/pubmed/30497358
http://dx.doi.org/10.1186/s12859-018-2417-6
_version_ 1783375948781977600
author Bonnici, Vincenzo
Giugno, Rosalba
Manca, Vincenzo
author_facet Bonnici, Vincenzo
Giugno, Rosalba
Manca, Vincenzo
author_sort Bonnici, Vincenzo
collection PubMed
description BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6266927
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62669272018-12-05 PanDelos: a dictionary-based method for pan-genome content discovery Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo BMC Bioinformatics Research BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-30 /pmc/articles/PMC6266927/ /pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Bonnici, Vincenzo
Giugno, Rosalba
Manca, Vincenzo
PanDelos: a dictionary-based method for pan-genome content discovery
title PanDelos: a dictionary-based method for pan-genome content discovery
title_full PanDelos: a dictionary-based method for pan-genome content discovery
title_fullStr PanDelos: a dictionary-based method for pan-genome content discovery
title_full_unstemmed PanDelos: a dictionary-based method for pan-genome content discovery
title_short PanDelos: a dictionary-based method for pan-genome content discovery
title_sort pandelos: a dictionary-based method for pan-genome content discovery
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266927/
https://www.ncbi.nlm.nih.gov/pubmed/30497358
http://dx.doi.org/10.1186/s12859-018-2417-6
work_keys_str_mv AT bonnicivincenzo pandelosadictionarybasedmethodforpangenomecontentdiscovery
AT giugnorosalba pandelosadictionarybasedmethodforpangenomecontentdiscovery
AT mancavincenzo pandelosadictionarybasedmethodforpangenomecontentdiscovery