Cargando…

RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity

In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ishikawa, Sohta A., Inagaki, Yuji, Hashimoto, Tetsuo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Libertas Academica 2012
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394461/ https://www.ncbi.nlm.nih.gov/pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017

_version_	1782237871392948224
author	Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo
author_facet	Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo
author_sort	Ishikawa, Sohta A.
collection	PubMed
description	In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous’ models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
format	Online Article Text
id	pubmed-3394461
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-33944612012-07-13 RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo Evol Bioinform Online Original Research In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous’ models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity. Libertas Academica 2012-06-25 /pmc/articles/PMC3394461/ /pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle	Original Research Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title	RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_full	RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_fullStr	RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_full_unstemmed	RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_short	RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
title_sort	ry-coding and non-homogeneous models can ameliorate the maximum-likelihood inferences from nucleotide sequence data with parallel compositional heterogeneity
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394461/ https://www.ncbi.nlm.nih.gov/pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017
work_keys_str_mv	AT ishikawasohtaa rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity AT inagakiyuji rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity AT hashimototetsuo rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity

RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity

Ejemplares similares