Cargando…

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

BACKGROUND: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical mod...

Descripción completa

Detalles Bibliográficos
Autores principales:	Keane, Thomas M, Creevey, Christopher J, Pentony, Melissa M, Naughton, Thomas J, Mclnerney, James O
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1435933/ https://www.ncbi.nlm.nih.gov/pubmed/16563161 http://dx.doi.org/10.1186/1471-2148-6-29

_version_	1782127298406776832
author	Keane, Thomas M Creevey, Christopher J Pentony, Melissa M Naughton, Thomas J Mclnerney, James O
author_facet	Keane, Thomas M Creevey, Christopher J Pentony, Melissa M Naughton, Thomas J Mclnerney, James O
author_sort	Keane, Thomas M
collection	PubMed
description	BACKGROUND: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. RESULTS: We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. CONCLUSION: This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.
format	Text
id	pubmed-1435933
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14359332006-04-14 Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified Keane, Thomas M Creevey, Christopher J Pentony, Melissa M Naughton, Thomas J Mclnerney, James O BMC Evol Biol Research Article BACKGROUND: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. RESULTS: We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. CONCLUSION: This demonstrates that choosing protein models based on their source or method of construction may not be appropriate. BioMed Central 2006-03-24 /pmc/articles/PMC1435933/ /pubmed/16563161 http://dx.doi.org/10.1186/1471-2148-6-29 Text en Copyright © 2006 Keane et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Keane, Thomas M Creevey, Christopher J Pentony, Melissa M Naughton, Thomas J Mclnerney, James O Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
title	Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
title_full	Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
title_fullStr	Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
title_full_unstemmed	Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
title_short	Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
title_sort	assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1435933/ https://www.ncbi.nlm.nih.gov/pubmed/16563161 http://dx.doi.org/10.1186/1471-2148-6-29
work_keys_str_mv	AT keanethomasm assessmentofmethodsforaminoacidmatrixselectionandtheiruseonempiricaldatashowsthatadhocassumptionsforchoiceofmatrixarenotjustified AT creeveychristopherj assessmentofmethodsforaminoacidmatrixselectionandtheiruseonempiricaldatashowsthatadhocassumptionsforchoiceofmatrixarenotjustified AT pentonymelissam assessmentofmethodsforaminoacidmatrixselectionandtheiruseonempiricaldatashowsthatadhocassumptionsforchoiceofmatrixarenotjustified AT naughtonthomasj assessmentofmethodsforaminoacidmatrixselectionandtheiruseonempiricaldatashowsthatadhocassumptionsforchoiceofmatrixarenotjustified AT mclnerneyjameso assessmentofmethodsforaminoacidmatrixselectionandtheiruseonempiricaldatashowsthatadhocassumptionsforchoiceofmatrixarenotjustified

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

Ejemplares similares