Cargando…

Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alter...

Descripción completa

Detalles Bibliográficos
Autores principales:	Soares, Inês, Goios, Ana, Amorim, António
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	The Scientific World Journal 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444837/ https://www.ncbi.nlm.nih.gov/pubmed/22997494 http://dx.doi.org/10.1100/2012/450124

_version_	1782243724573540352
author	Soares, Inês Goios, Ana Amorim, António
author_facet	Soares, Inês Goios, Ana Amorim, António
author_sort	Soares, Inês
collection	PubMed
description	The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.
format	Online Article Text
id	pubmed-3444837
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	The Scientific World Journal
record_format	MEDLINE/PubMed
spelling	pubmed-34448372012-09-20 Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency Soares, Inês Goios, Ana Amorim, António ScientificWorldJournal Research Article The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web. The Scientific World Journal 2012-09-10 /pmc/articles/PMC3444837/ /pubmed/22997494 http://dx.doi.org/10.1100/2012/450124 Text en Copyright © 2012 Inês Soares et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Soares, Inês Goios, Ana Amorim, António Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
title	Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
title_full	Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
title_fullStr	Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
title_full_unstemmed	Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
title_short	Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
title_sort	sequence comparison alignment-free approach based on suffix tree and l-words frequency
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444837/ https://www.ncbi.nlm.nih.gov/pubmed/22997494 http://dx.doi.org/10.1100/2012/450124
work_keys_str_mv	AT soaresines sequencecomparisonalignmentfreeapproachbasedonsuffixtreeandlwordsfrequency AT goiosana sequencecomparisonalignmentfreeapproachbasedonsuffixtreeandlwordsfrequency AT amorimantonio sequencecomparisonalignmentfreeapproachbasedonsuffixtreeandlwordsfrequency

Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

Ejemplares similares