Cargando…
Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word de...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8249421/ https://www.ncbi.nlm.nih.gov/pubmed/34211040 http://dx.doi.org/10.1038/s41598-021-93154-3 |
_version_ | 1783716901330878464 |
---|---|
author | Sarkar, Bimal Kumar Sharma, Ashish Ranjan Bhattacharya, Manojit Sharma, Garima Lee, Sang-Soo Chakraborty, Chiranjib |
author_facet | Sarkar, Bimal Kumar Sharma, Ashish Ranjan Bhattacharya, Manojit Sharma, Garima Lee, Sang-Soo Chakraborty, Chiranjib |
author_sort | Sarkar, Bimal Kumar |
collection | PubMed |
description | We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Further principal component analysis is used on word density distribution to analyze the dissimilarity between sequences. A dissimilarity matrix is thus formed and shows the appearance of cluster formation. This cluster formation is constructed based on the alignment-free sequence method. Furthermore, the clusters are used to build phylogenetic relations. The cluster algorithm is in good agreement with alignment-based algorithms. The present algorithm is simple and requires less time for computation than other currently available algorithms. We tested the algorithm using beta hemoglobin coding sequences (HBB) of 10 different species and 18 primate mitochondria genome (mtDNA) sequences. |
format | Online Article Text |
id | pubmed-8249421 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-82494212021-07-06 Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter Sarkar, Bimal Kumar Sharma, Ashish Ranjan Bhattacharya, Manojit Sharma, Garima Lee, Sang-Soo Chakraborty, Chiranjib Sci Rep Article We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Further principal component analysis is used on word density distribution to analyze the dissimilarity between sequences. A dissimilarity matrix is thus formed and shows the appearance of cluster formation. This cluster formation is constructed based on the alignment-free sequence method. Furthermore, the clusters are used to build phylogenetic relations. The cluster algorithm is in good agreement with alignment-based algorithms. The present algorithm is simple and requires less time for computation than other currently available algorithms. We tested the algorithm using beta hemoglobin coding sequences (HBB) of 10 different species and 18 primate mitochondria genome (mtDNA) sequences. Nature Publishing Group UK 2021-07-01 /pmc/articles/PMC8249421/ /pubmed/34211040 http://dx.doi.org/10.1038/s41598-021-93154-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Sarkar, Bimal Kumar Sharma, Ashish Ranjan Bhattacharya, Manojit Sharma, Garima Lee, Sang-Soo Chakraborty, Chiranjib Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter |
title | Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter |
title_full | Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter |
title_fullStr | Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter |
title_full_unstemmed | Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter |
title_short | Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter |
title_sort | determination of k-mer density in a dna sequence and subsequent cluster formation algorithm based on the application of electronic filter |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8249421/ https://www.ncbi.nlm.nih.gov/pubmed/34211040 http://dx.doi.org/10.1038/s41598-021-93154-3 |
work_keys_str_mv | AT sarkarbimalkumar determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter AT sharmaashishranjan determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter AT bhattacharyamanojit determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter AT sharmagarima determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter AT leesangsoo determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter AT chakrabortychiranjib determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter |