Cargando…

Alignment-free method for DNA sequence clustering using Fuzzy integral similarity

A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we develop...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saw, Ajay Kumar, Raj, Garima, Das, Manashi, Talukdar, Narayan Chandra, Tripathy, Binod Chandra, Nandi, Soumyadeep
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6403383/ https://www.ncbi.nlm.nih.gov/pubmed/30842590 http://dx.doi.org/10.1038/s41598-019-40452-6

_version_	1783400589741260800
author	Saw, Ajay Kumar Raj, Garima Das, Manashi Talukdar, Narayan Chandra Tripathy, Binod Chandra Nandi, Soumyadeep
author_facet	Saw, Ajay Kumar Raj, Garima Das, Manashi Talukdar, Narayan Chandra Tripathy, Binod Chandra Nandi, Soumyadeep
author_sort	Saw, Ajay Kumar
collection	PubMed
description	A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale.
format	Online Article Text
id	pubmed-6403383
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-64033832019-03-11 Alignment-free method for DNA sequence clustering using Fuzzy integral similarity Saw, Ajay Kumar Raj, Garima Das, Manashi Talukdar, Narayan Chandra Tripathy, Binod Chandra Nandi, Soumyadeep Sci Rep Article A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale. Nature Publishing Group UK 2019-03-06 /pmc/articles/PMC6403383/ /pubmed/30842590 http://dx.doi.org/10.1038/s41598-019-40452-6 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Saw, Ajay Kumar Raj, Garima Das, Manashi Talukdar, Narayan Chandra Tripathy, Binod Chandra Nandi, Soumyadeep Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title	Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_full	Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_fullStr	Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_full_unstemmed	Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_short	Alignment-free method for DNA sequence clustering using Fuzzy integral similarity
title_sort	alignment-free method for dna sequence clustering using fuzzy integral similarity
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6403383/ https://www.ncbi.nlm.nih.gov/pubmed/30842590 http://dx.doi.org/10.1038/s41598-019-40452-6
work_keys_str_mv	AT sawajaykumar alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity AT rajgarima alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity AT dasmanashi alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity AT talukdarnarayanchandra alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity AT tripathybinodchandra alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity AT nandisoumyadeep alignmentfreemethodfordnasequenceclusteringusingfuzzyintegralsimilarity

Alignment-free method for DNA sequence clustering using Fuzzy integral similarity

Ejemplares similares