Cargando…
A novel fast vector method for genetic sequence comparison
With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstru...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610321/ https://www.ncbi.nlm.nih.gov/pubmed/28939913 http://dx.doi.org/10.1038/s41598-017-12493-2 |
_version_ | 1783265759979372544 |
---|---|
author | Li, Yongkun He, Lily Lucy He, Rong Yau, Stephen S.-T. |
author_facet | Li, Yongkun He, Lily Lucy He, Rong Yau, Stephen S.-T. |
author_sort | Li, Yongkun |
collection | PubMed |
description | With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically consist of some typical numerical features for certain biological problems. The features may come from the primary sequences, secondary or three dimensional structures of macromolecules. In this study, we propose a novel numerical vector based on only primary sequences of organism to build their phylogeny. Three chemical and physical properties of primary sequences: purine, pyrimidine and keto are also incorporated to the vector. Using each property, we convert the nucleotide sequence into a new sequence consisting of only two kinds of letters. Therefore, three sequences are constructed according to the three properties. For each letter of each sequence we calculate the number of the letter, the average position of the letter and the variation of the position of the letter appearing in the sequence. Tested on several datasets related to mammals, viruses and bacteria, this new tool is fast in speed and accurate for inferring the phylogeny of organisms. |
format | Online Article Text |
id | pubmed-5610321 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-56103212017-10-10 A novel fast vector method for genetic sequence comparison Li, Yongkun He, Lily Lucy He, Rong Yau, Stephen S.-T. Sci Rep Article With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically consist of some typical numerical features for certain biological problems. The features may come from the primary sequences, secondary or three dimensional structures of macromolecules. In this study, we propose a novel numerical vector based on only primary sequences of organism to build their phylogeny. Three chemical and physical properties of primary sequences: purine, pyrimidine and keto are also incorporated to the vector. Using each property, we convert the nucleotide sequence into a new sequence consisting of only two kinds of letters. Therefore, three sequences are constructed according to the three properties. For each letter of each sequence we calculate the number of the letter, the average position of the letter and the variation of the position of the letter appearing in the sequence. Tested on several datasets related to mammals, viruses and bacteria, this new tool is fast in speed and accurate for inferring the phylogeny of organisms. Nature Publishing Group UK 2017-09-22 /pmc/articles/PMC5610321/ /pubmed/28939913 http://dx.doi.org/10.1038/s41598-017-12493-2 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Li, Yongkun He, Lily Lucy He, Rong Yau, Stephen S.-T. A novel fast vector method for genetic sequence comparison |
title | A novel fast vector method for genetic sequence comparison |
title_full | A novel fast vector method for genetic sequence comparison |
title_fullStr | A novel fast vector method for genetic sequence comparison |
title_full_unstemmed | A novel fast vector method for genetic sequence comparison |
title_short | A novel fast vector method for genetic sequence comparison |
title_sort | novel fast vector method for genetic sequence comparison |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610321/ https://www.ncbi.nlm.nih.gov/pubmed/28939913 http://dx.doi.org/10.1038/s41598-017-12493-2 |
work_keys_str_mv | AT liyongkun anovelfastvectormethodforgeneticsequencecomparison AT helily anovelfastvectormethodforgeneticsequencecomparison AT lucyherong anovelfastvectormethodforgeneticsequencecomparison AT yaustephenst anovelfastvectormethodforgeneticsequencecomparison AT liyongkun novelfastvectormethodforgeneticsequencecomparison AT helily novelfastvectormethodforgeneticsequencecomparison AT lucyherong novelfastvectormethodforgeneticsequencecomparison AT yaustephenst novelfastvectormethodforgeneticsequencecomparison |