Cargando…

An efficient strategy using k-mers to analyse 16S rRNA sequences

The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to...

Descripción completa

Detalles Bibliográficos
Autores principales: Martínez-Porchas, Marcel, Vargas-Albores, Francisco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537200/
https://www.ncbi.nlm.nih.gov/pubmed/28795166
http://dx.doi.org/10.1016/j.heliyon.2017.e00370
_version_ 1783254122814767104
author Martínez-Porchas, Marcel
Vargas-Albores, Francisco
author_facet Martínez-Porchas, Marcel
Vargas-Albores, Francisco
author_sort Martínez-Porchas, Marcel
collection PubMed
description The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.
format Online
Article
Text
id pubmed-5537200
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-55372002017-08-09 An efficient strategy using k-mers to analyse 16S rRNA sequences Martínez-Porchas, Marcel Vargas-Albores, Francisco Heliyon Article The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples. Elsevier 2017-07-27 /pmc/articles/PMC5537200/ /pubmed/28795166 http://dx.doi.org/10.1016/j.heliyon.2017.e00370 Text en © 2017 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Martínez-Porchas, Marcel
Vargas-Albores, Francisco
An efficient strategy using k-mers to analyse 16S rRNA sequences
title An efficient strategy using k-mers to analyse 16S rRNA sequences
title_full An efficient strategy using k-mers to analyse 16S rRNA sequences
title_fullStr An efficient strategy using k-mers to analyse 16S rRNA sequences
title_full_unstemmed An efficient strategy using k-mers to analyse 16S rRNA sequences
title_short An efficient strategy using k-mers to analyse 16S rRNA sequences
title_sort efficient strategy using k-mers to analyse 16s rrna sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537200/
https://www.ncbi.nlm.nih.gov/pubmed/28795166
http://dx.doi.org/10.1016/j.heliyon.2017.e00370
work_keys_str_mv AT martinezporchasmarcel anefficientstrategyusingkmerstoanalyse16srrnasequences
AT vargasalboresfrancisco anefficientstrategyusingkmerstoanalyse16srrnasequences
AT martinezporchasmarcel efficientstrategyusingkmerstoanalyse16srrnasequences
AT vargasalboresfrancisco efficientstrategyusingkmerstoanalyse16srrnasequences