Improved orthologous databases to ease protozoan targets inference

BACKGROUND: Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics appli...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kotowski, Nelson, Jardim, Rodrigo, Dávila, Alberto M. R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4587786/ https://www.ncbi.nlm.nih.gov/pubmed/26416523 http://dx.doi.org/10.1186/s13071-015-1090-0

_version_	1782392513615626240
author	Kotowski, Nelson Jardim, Rodrigo Dávila, Alberto M. R.
author_facet	Kotowski, Nelson Jardim, Rodrigo Dávila, Alberto M. R.
author_sort	Kotowski, Nelson
collection	PubMed
description	BACKGROUND: Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. METHODS: Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. RESULTS: The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB”, with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB” databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. CONCLUSIONS: The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13071-015-1090-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4587786
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45877862015-09-30 Improved orthologous databases to ease protozoan targets inference Kotowski, Nelson Jardim, Rodrigo Dávila, Alberto M. R. Parasit Vectors Research BACKGROUND: Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. METHODS: Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. RESULTS: The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB”, with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB” databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. CONCLUSIONS: The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13071-015-1090-0) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-29 /pmc/articles/PMC4587786/ /pubmed/26416523 http://dx.doi.org/10.1186/s13071-015-1090-0 Text en © Kotowski et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Kotowski, Nelson Jardim, Rodrigo Dávila, Alberto M. R. Improved orthologous databases to ease protozoan targets inference
title	Improved orthologous databases to ease protozoan targets inference
title_full	Improved orthologous databases to ease protozoan targets inference
title_fullStr	Improved orthologous databases to ease protozoan targets inference
title_full_unstemmed	Improved orthologous databases to ease protozoan targets inference
title_short	Improved orthologous databases to ease protozoan targets inference
title_sort	improved orthologous databases to ease protozoan targets inference
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4587786/ https://www.ncbi.nlm.nih.gov/pubmed/26416523 http://dx.doi.org/10.1186/s13071-015-1090-0
work_keys_str_mv	AT kotowskinelson improvedorthologousdatabasestoeaseprotozoantargetsinference AT jardimrodrigo improvedorthologousdatabasestoeaseprotozoantargetsinference AT davilaalbertomr improvedorthologousdatabasestoeaseprotozoantargetsinference

Improved orthologous databases to ease protozoan targets inference

Ejemplares similares