Cargando…

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter

We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word de...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarkar, Bimal Kumar, Sharma, Ashish Ranjan, Bhattacharya, Manojit, Sharma, Garima, Lee, Sang-Soo, Chakraborty, Chiranjib
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8249421/
https://www.ncbi.nlm.nih.gov/pubmed/34211040
http://dx.doi.org/10.1038/s41598-021-93154-3
_version_ 1783716901330878464
author Sarkar, Bimal Kumar
Sharma, Ashish Ranjan
Bhattacharya, Manojit
Sharma, Garima
Lee, Sang-Soo
Chakraborty, Chiranjib
author_facet Sarkar, Bimal Kumar
Sharma, Ashish Ranjan
Bhattacharya, Manojit
Sharma, Garima
Lee, Sang-Soo
Chakraborty, Chiranjib
author_sort Sarkar, Bimal Kumar
collection PubMed
description We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Further principal component analysis is used on word density distribution to analyze the dissimilarity between sequences. A dissimilarity matrix is thus formed and shows the appearance of cluster formation. This cluster formation is constructed based on the alignment-free sequence method. Furthermore, the clusters are used to build phylogenetic relations. The cluster algorithm is in good agreement with alignment-based algorithms. The present algorithm is simple and requires less time for computation than other currently available algorithms. We tested the algorithm using beta hemoglobin coding sequences (HBB) of 10 different species and 18 primate mitochondria genome (mtDNA) sequences.
format Online
Article
Text
id pubmed-8249421
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82494212021-07-06 Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter Sarkar, Bimal Kumar Sharma, Ashish Ranjan Bhattacharya, Manojit Sharma, Garima Lee, Sang-Soo Chakraborty, Chiranjib Sci Rep Article We describe a novel algorithm for information recovery from DNA sequences by using a digital filter. This work proposes a three-part algorithm to decide the k-mer or q-gram word density. Employing a finite impulse response digital filter, one can calculate the sequence's k-mer or q-gram word density. Further principal component analysis is used on word density distribution to analyze the dissimilarity between sequences. A dissimilarity matrix is thus formed and shows the appearance of cluster formation. This cluster formation is constructed based on the alignment-free sequence method. Furthermore, the clusters are used to build phylogenetic relations. The cluster algorithm is in good agreement with alignment-based algorithms. The present algorithm is simple and requires less time for computation than other currently available algorithms. We tested the algorithm using beta hemoglobin coding sequences (HBB) of 10 different species and 18 primate mitochondria genome (mtDNA) sequences. Nature Publishing Group UK 2021-07-01 /pmc/articles/PMC8249421/ /pubmed/34211040 http://dx.doi.org/10.1038/s41598-021-93154-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sarkar, Bimal Kumar
Sharma, Ashish Ranjan
Bhattacharya, Manojit
Sharma, Garima
Lee, Sang-Soo
Chakraborty, Chiranjib
Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
title Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
title_full Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
title_fullStr Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
title_full_unstemmed Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
title_short Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter
title_sort determination of k-mer density in a dna sequence and subsequent cluster formation algorithm based on the application of electronic filter
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8249421/
https://www.ncbi.nlm.nih.gov/pubmed/34211040
http://dx.doi.org/10.1038/s41598-021-93154-3
work_keys_str_mv AT sarkarbimalkumar determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter
AT sharmaashishranjan determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter
AT bhattacharyamanojit determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter
AT sharmagarima determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter
AT leesangsoo determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter
AT chakrabortychiranjib determinationofkmerdensityinadnasequenceandsubsequentclusterformationalgorithmbasedontheapplicationofelectronicfilter