Cargando…

Ortholog-Finder: A Tool for Constructing an Ortholog Data Set

Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present amon...

Descripción completa

Detalles Bibliográficos
Autores principales: Horiike, Tokumasa, Minai, Ryoichi, Miyata, Daisuke, Nakamura, Yoji, Tateno, Yoshio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779612/
https://www.ncbi.nlm.nih.gov/pubmed/26782935
http://dx.doi.org/10.1093/gbe/evw005
_version_ 1782419648231243776
author Horiike, Tokumasa
Minai, Ryoichi
Miyata, Daisuke
Nakamura, Yoji
Tateno, Yoshio
author_facet Horiike, Tokumasa
Minai, Ryoichi
Miyata, Daisuke
Nakamura, Yoji
Tateno, Yoshio
author_sort Horiike, Tokumasa
collection PubMed
description Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, “Ortholog-Finder,” to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees.
format Online
Article
Text
id pubmed-4779612
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47796122016-03-07 Ortholog-Finder: A Tool for Constructing an Ortholog Data Set Horiike, Tokumasa Minai, Ryoichi Miyata, Daisuke Nakamura, Yoji Tateno, Yoshio Genome Biol Evol Research Article Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, “Ortholog-Finder,” to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees. Oxford University Press 2016-01-18 /pmc/articles/PMC4779612/ /pubmed/26782935 http://dx.doi.org/10.1093/gbe/evw005 Text en © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Horiike, Tokumasa
Minai, Ryoichi
Miyata, Daisuke
Nakamura, Yoji
Tateno, Yoshio
Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
title Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
title_full Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
title_fullStr Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
title_full_unstemmed Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
title_short Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
title_sort ortholog-finder: a tool for constructing an ortholog data set
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779612/
https://www.ncbi.nlm.nih.gov/pubmed/26782935
http://dx.doi.org/10.1093/gbe/evw005
work_keys_str_mv AT horiiketokumasa orthologfinderatoolforconstructinganorthologdataset
AT minairyoichi orthologfinderatoolforconstructinganorthologdataset
AT miyatadaisuke orthologfinderatoolforconstructinganorthologdataset
AT nakamurayoji orthologfinderatoolforconstructinganorthologdataset
AT tatenoyoshio orthologfinderatoolforconstructinganorthologdataset