Cargando…

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Motivation: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Menzel, Peter, Stadler, Peter F., Gorodkin, Jan
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2011
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3031029/ https://www.ncbi.nlm.nih.gov/pubmed/21123221 http://dx.doi.org/10.1093/bioinformatics/btq651

_version_	1782197311800082432
author	Menzel, Peter Stadler, Peter F. Gorodkin, Jan
author_facet	Menzel, Peter Stadler, Peter F. Gorodkin, Jan
author_sort	Menzel, Peter
collection	PubMed
description	Motivation: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. Results: We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. Availability: maxAlike is available for download and web server at: http://rth.dk/resources/maxAlike. Contact: gorodkin@rth.dk Supplementary information: Supplementary data are available at Bioinformatics online.
format	Text
id	pubmed-3031029
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-30310292011-02-02 maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences Menzel, Peter Stadler, Peter F. Gorodkin, Jan Bioinformatics Original Papers Motivation: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. Results: We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. Availability: maxAlike is available for download and web server at: http://rth.dk/resources/maxAlike. Contact: gorodkin@rth.dk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2011-02-01 2010-12-01 /pmc/articles/PMC3031029/ /pubmed/21123221 http://dx.doi.org/10.1093/bioinformatics/btq651 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Menzel, Peter Stadler, Peter F. Gorodkin, Jan maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
title	maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
title_full	maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
title_fullStr	maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
title_full_unstemmed	maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
title_short	maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
title_sort	maxalike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3031029/ https://www.ncbi.nlm.nih.gov/pubmed/21123221 http://dx.doi.org/10.1093/bioinformatics/btq651
work_keys_str_mv	AT menzelpeter maxalikemaximumlikelihoodbasedsequencereconstructionwithapplicationtoimprovedprimerdesignforunknownsequences AT stadlerpeterf maxalikemaximumlikelihoodbasedsequencereconstructionwithapplicationtoimprovedprimerdesignforunknownsequences AT gorodkinjan maxalikemaximumlikelihoodbasedsequencereconstructionwithapplicationtoimprovedprimerdesignforunknownsequences

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Ejemplares similares