Cargando…

A novel fast vector method for genetic sequence comparison

With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstru...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yongkun, He, Lily, Lucy He, Rong, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610321/
https://www.ncbi.nlm.nih.gov/pubmed/28939913
http://dx.doi.org/10.1038/s41598-017-12493-2
_version_ 1783265759979372544
author Li, Yongkun
He, Lily
Lucy He, Rong
Yau, Stephen S.-T.
author_facet Li, Yongkun
He, Lily
Lucy He, Rong
Yau, Stephen S.-T.
author_sort Li, Yongkun
collection PubMed
description With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically consist of some typical numerical features for certain biological problems. The features may come from the primary sequences, secondary or three dimensional structures of macromolecules. In this study, we propose a novel numerical vector based on only primary sequences of organism to build their phylogeny. Three chemical and physical properties of primary sequences: purine, pyrimidine and keto are also incorporated to the vector. Using each property, we convert the nucleotide sequence into a new sequence consisting of only two kinds of letters. Therefore, three sequences are constructed according to the three properties. For each letter of each sequence we calculate the number of the letter, the average position of the letter and the variation of the position of the letter appearing in the sequence. Tested on several datasets related to mammals, viruses and bacteria, this new tool is fast in speed and accurate for inferring the phylogeny of organisms.
format Online
Article
Text
id pubmed-5610321
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-56103212017-10-10 A novel fast vector method for genetic sequence comparison Li, Yongkun He, Lily Lucy He, Rong Yau, Stephen S.-T. Sci Rep Article With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically consist of some typical numerical features for certain biological problems. The features may come from the primary sequences, secondary or three dimensional structures of macromolecules. In this study, we propose a novel numerical vector based on only primary sequences of organism to build their phylogeny. Three chemical and physical properties of primary sequences: purine, pyrimidine and keto are also incorporated to the vector. Using each property, we convert the nucleotide sequence into a new sequence consisting of only two kinds of letters. Therefore, three sequences are constructed according to the three properties. For each letter of each sequence we calculate the number of the letter, the average position of the letter and the variation of the position of the letter appearing in the sequence. Tested on several datasets related to mammals, viruses and bacteria, this new tool is fast in speed and accurate for inferring the phylogeny of organisms. Nature Publishing Group UK 2017-09-22 /pmc/articles/PMC5610321/ /pubmed/28939913 http://dx.doi.org/10.1038/s41598-017-12493-2 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Li, Yongkun
He, Lily
Lucy He, Rong
Yau, Stephen S.-T.
A novel fast vector method for genetic sequence comparison
title A novel fast vector method for genetic sequence comparison
title_full A novel fast vector method for genetic sequence comparison
title_fullStr A novel fast vector method for genetic sequence comparison
title_full_unstemmed A novel fast vector method for genetic sequence comparison
title_short A novel fast vector method for genetic sequence comparison
title_sort novel fast vector method for genetic sequence comparison
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610321/
https://www.ncbi.nlm.nih.gov/pubmed/28939913
http://dx.doi.org/10.1038/s41598-017-12493-2
work_keys_str_mv AT liyongkun anovelfastvectormethodforgeneticsequencecomparison
AT helily anovelfastvectormethodforgeneticsequencecomparison
AT lucyherong anovelfastvectormethodforgeneticsequencecomparison
AT yaustephenst anovelfastvectormethodforgeneticsequencecomparison
AT liyongkun novelfastvectormethodforgeneticsequencecomparison
AT helily novelfastvectormethodforgeneticsequencecomparison
AT lucyherong novelfastvectormethodforgeneticsequencecomparison
AT yaustephenst novelfastvectormethodforgeneticsequencecomparison