Cargando…

Alignment-free method for DNA sequence clustering using Fuzzy integral similarity

A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we develop...

Descripción completa

Detalles Bibliográficos
Autores principales: Saw, Ajay Kumar, Raj, Garima, Das, Manashi, Talukdar, Narayan Chandra, Tripathy, Binod Chandra, Nandi, Soumyadeep
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6403383/
https://www.ncbi.nlm.nih.gov/pubmed/30842590
http://dx.doi.org/10.1038/s41598-019-40452-6
_version_ 1783400589741260800
author Saw, Ajay Kumar
Raj, Garima
Das, Manashi
Talukdar, Narayan Chandra
Tripathy, Binod Chandra
Nandi, Soumyadeep
author_facet Saw, Ajay Kumar
Raj, Garima
Das, Manashi
Talukdar, Narayan Chandra
Tripathy, Binod Chandra
Nandi, Soumyadeep
author_sort Saw, Ajay Kumar
collection PubMed
description A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale.
format Online
Article
Text
id pubmed-6403383
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-64033832019-03-11 Alignment-free method for DNA sequence clustering using Fuzzy integral similarity Saw, Ajay Kumar Raj, Garima Das, Manashi Talukdar, Narayan Chandra Tripathy, Binod Chandra Nandi, Soumyadeep Sci Rep Article A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale. Nature Publishing Group UK 2019-03-06 /pmc/articles/PMC6403383/ /pubmed/30842590 http://dx.doi.org/10.1038/s41598-019-40452-6 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Saw, Ajay Kumar
Raj, Garima
Das, Manashi
Talukdar, Narayan Chandra
Tripathy, Binod Chandra
Nandi, Soumyadeep
Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_full Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_fullStr Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_full_unstemmed Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_short Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_sort alignment-free method for dna sequence clustering using fuzzy integral similarity
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6403383/
https://www.ncbi.nlm.nih.gov/pubmed/30842590
http://dx.doi.org/10.1038/s41598-019-40452-6
work_keys_str_mv AT sawajaykumar alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity
AT rajgarima alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity
AT dasmanashi alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity
AT talukdarnarayanchandra alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity
AT tripathybinodchandra alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity
AT nandisoumyadeep alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity