Cargando…

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination

BACKGROUND: We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Ana...

Descripción completa

Detalles Bibliográficos
Autores principales: Lücking, Robert, Hodkinson, Brendan P, Stamatakis, Alexandros, Cartwright, Reed A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3024941/
https://www.ncbi.nlm.nih.gov/pubmed/21214904
http://dx.doi.org/10.1186/1471-2105-12-10
_version_ 1782196838310346752
author Lücking, Robert
Hodkinson, Brendan P
Stamatakis, Alexandros
Cartwright, Reed A
author_facet Lücking, Robert
Hodkinson, Brendan P
Stamatakis, Alexandros
Cartwright, Reed A
author_sort Lücking, Robert
collection PubMed
description BACKGROUND: We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method. RESULTS: Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model. CONCLUSIONS: Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED. AVAILABILITY: An implementation of the PICS-Ord algorithm is available from http://scit.us/projects/ngila/wiki/PICS-Ord. It requires both the statistical software, R http://www.r-project.org and the alignment software Ngila http://scit.us/projects/ngila.
format Text
id pubmed-3024941
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30249412011-01-22 PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination Lücking, Robert Hodkinson, Brendan P Stamatakis, Alexandros Cartwright, Reed A BMC Bioinformatics Methodology Article BACKGROUND: We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method. RESULTS: Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model. CONCLUSIONS: Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED. AVAILABILITY: An implementation of the PICS-Ord algorithm is available from http://scit.us/projects/ngila/wiki/PICS-Ord. It requires both the statistical software, R http://www.r-project.org and the alignment software Ngila http://scit.us/projects/ngila. BioMed Central 2011-01-07 /pmc/articles/PMC3024941/ /pubmed/21214904 http://dx.doi.org/10.1186/1471-2105-12-10 Text en Copyright ©2011 Lücking et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lücking, Robert
Hodkinson, Brendan P
Stamatakis, Alexandros
Cartwright, Reed A
PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_full PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_fullStr PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_full_unstemmed PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_short PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_sort pics-ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3024941/
https://www.ncbi.nlm.nih.gov/pubmed/21214904
http://dx.doi.org/10.1186/1471-2105-12-10
work_keys_str_mv AT luckingrobert picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
AT hodkinsonbrendanp picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
AT stamatakisalexandros picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
AT cartwrightreeda picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination