Cargando…
Ortholog-Finder: A Tool for Constructing an Ortholog Data Set
Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present amon...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779612/ https://www.ncbi.nlm.nih.gov/pubmed/26782935 http://dx.doi.org/10.1093/gbe/evw005 |
_version_ | 1782419648231243776 |
---|---|
author | Horiike, Tokumasa Minai, Ryoichi Miyata, Daisuke Nakamura, Yoji Tateno, Yoshio |
author_facet | Horiike, Tokumasa Minai, Ryoichi Miyata, Daisuke Nakamura, Yoji Tateno, Yoshio |
author_sort | Horiike, Tokumasa |
collection | PubMed |
description | Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, “Ortholog-Finder,” to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees. |
format | Online Article Text |
id | pubmed-4779612 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-47796122016-03-07 Ortholog-Finder: A Tool for Constructing an Ortholog Data Set Horiike, Tokumasa Minai, Ryoichi Miyata, Daisuke Nakamura, Yoji Tateno, Yoshio Genome Biol Evol Research Article Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, “Ortholog-Finder,” to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees. Oxford University Press 2016-01-18 /pmc/articles/PMC4779612/ /pubmed/26782935 http://dx.doi.org/10.1093/gbe/evw005 Text en © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Horiike, Tokumasa Minai, Ryoichi Miyata, Daisuke Nakamura, Yoji Tateno, Yoshio Ortholog-Finder: A Tool for Constructing an Ortholog Data Set |
title | Ortholog-Finder: A Tool for Constructing an Ortholog Data Set |
title_full | Ortholog-Finder: A Tool for Constructing an Ortholog Data Set |
title_fullStr | Ortholog-Finder: A Tool for Constructing an Ortholog Data Set |
title_full_unstemmed | Ortholog-Finder: A Tool for Constructing an Ortholog Data Set |
title_short | Ortholog-Finder: A Tool for Constructing an Ortholog Data Set |
title_sort | ortholog-finder: a tool for constructing an ortholog data set |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779612/ https://www.ncbi.nlm.nih.gov/pubmed/26782935 http://dx.doi.org/10.1093/gbe/evw005 |
work_keys_str_mv | AT horiiketokumasa orthologfinderatoolforconstructinganorthologdataset AT minairyoichi orthologfinderatoolforconstructinganorthologdataset AT miyatadaisuke orthologfinderatoolforconstructinganorthologdataset AT nakamurayoji orthologfinderatoolforconstructinganorthologdataset AT tatenoyoshio orthologfinderatoolforconstructinganorthologdataset |