Cargando…

RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity

In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can...

Descripción completa

Detalles Bibliográficos
Autores principales: Ishikawa, Sohta A., Inagaki, Yuji, Hashimoto, Tetsuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394461/
https://www.ncbi.nlm.nih.gov/pubmed/22798721
http://dx.doi.org/10.4137/EBO.S9017
_version_ 1782237871392948224
author Ishikawa, Sohta A.
Inagaki, Yuji
Hashimoto, Tetsuo
author_facet Ishikawa, Sohta A.
Inagaki, Yuji
Hashimoto, Tetsuo
author_sort Ishikawa, Sohta A.
collection PubMed
description In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous’ models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
format Online
Article
Text
id pubmed-3394461
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-33944612012-07-13 RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo Evol Bioinform Online Original Research In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous’ models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity. Libertas Academica 2012-06-25 /pmc/articles/PMC3394461/ /pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Original Research
Ishikawa, Sohta A.
Inagaki, Yuji
Hashimoto, Tetsuo
RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_full RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_fullStr RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_full_unstemmed RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_short RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_sort ry-coding and non-homogeneous models can ameliorate the maximum-likelihood inferences from nucleotide sequence data with parallel compositional heterogeneity
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394461/
https://www.ncbi.nlm.nih.gov/pubmed/22798721
http://dx.doi.org/10.4137/EBO.S9017
work_keys_str_mv AT ishikawasohtaa rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity
AT inagakiyuji rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity
AT hashimototetsuo rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity