Cargando…

Cluster oligonucleotide signatures for rapid identification by sequencing

BACKGROUND: Oligonucleotide signatures (signatures) have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept. Existing signature des...

Descripción completa

Detalles Bibliográficos
Autores principales: Zahariev, Manuel, Chen, Wen, Visagie, Cobus M., Lévesque, C. André
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284311/
https://www.ncbi.nlm.nih.gov/pubmed/30522439
http://dx.doi.org/10.1186/s12859-018-2363-3
_version_ 1783379314330304512
author Zahariev, Manuel
Chen, Wen
Visagie, Cobus M.
Lévesque, C. André
author_facet Zahariev, Manuel
Chen, Wen
Visagie, Cobus M.
Lévesque, C. André
author_sort Zahariev, Manuel
collection PubMed
description BACKGROUND: Oligonucleotide signatures (signatures) have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept. Existing signature design programs for sequence signatures (signatures matching exactly one sequence) or clade signatures (signatures matching every sequence in a phylogenetic clade) are not able to identify all possible polymorphic sites for sequences with high similarity and perform poorly when handling large genome sequencing datasets. RESULTS: We introduce cluster signatures: subsequences that match perfectly and exclusively any group of sequences in a data set. Cluster signatures provide complete recall for primer/probe design and increased discrimination between sequences beyond that of clade signatures. Using cluster signatures for in silico identification of HTS targets achieves good precision/recall and running time performance. This method has been implemented into an open source tool, the Automated Oligonucleotide Design Pipeline (adop), included in supplementary material and available at: https://bitbucket.org/wenchen_aafc/aodp_v2.0_release. CONCLUSIONS: Cluster signatures provide a rapid and universal analysis tool to identify all possible short diagnostic DNA markers and variants from any DNA sequencing dataset. They are particularly useful in discriminating genetic material from closely related organisms and in detecting deleterious mutations in highly or perfectly conserved genomic sites.
format Online
Article
Text
id pubmed-6284311
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62843112018-12-14 Cluster oligonucleotide signatures for rapid identification by sequencing Zahariev, Manuel Chen, Wen Visagie, Cobus M. Lévesque, C. André BMC Bioinformatics Research Article BACKGROUND: Oligonucleotide signatures (signatures) have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept. Existing signature design programs for sequence signatures (signatures matching exactly one sequence) or clade signatures (signatures matching every sequence in a phylogenetic clade) are not able to identify all possible polymorphic sites for sequences with high similarity and perform poorly when handling large genome sequencing datasets. RESULTS: We introduce cluster signatures: subsequences that match perfectly and exclusively any group of sequences in a data set. Cluster signatures provide complete recall for primer/probe design and increased discrimination between sequences beyond that of clade signatures. Using cluster signatures for in silico identification of HTS targets achieves good precision/recall and running time performance. This method has been implemented into an open source tool, the Automated Oligonucleotide Design Pipeline (adop), included in supplementary material and available at: https://bitbucket.org/wenchen_aafc/aodp_v2.0_release. CONCLUSIONS: Cluster signatures provide a rapid and universal analysis tool to identify all possible short diagnostic DNA markers and variants from any DNA sequencing dataset. They are particularly useful in discriminating genetic material from closely related organisms and in detecting deleterious mutations in highly or perfectly conserved genomic sites. BioMed Central 2018-10-29 /pmc/articles/PMC6284311/ /pubmed/30522439 http://dx.doi.org/10.1186/s12859-018-2363-3 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zahariev, Manuel
Chen, Wen
Visagie, Cobus M.
Lévesque, C. André
Cluster oligonucleotide signatures for rapid identification by sequencing
title Cluster oligonucleotide signatures for rapid identification by sequencing
title_full Cluster oligonucleotide signatures for rapid identification by sequencing
title_fullStr Cluster oligonucleotide signatures for rapid identification by sequencing
title_full_unstemmed Cluster oligonucleotide signatures for rapid identification by sequencing
title_short Cluster oligonucleotide signatures for rapid identification by sequencing
title_sort cluster oligonucleotide signatures for rapid identification by sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284311/
https://www.ncbi.nlm.nih.gov/pubmed/30522439
http://dx.doi.org/10.1186/s12859-018-2363-3
work_keys_str_mv AT zaharievmanuel clusteroligonucleotidesignaturesforrapididentificationbysequencing
AT chenwen clusteroligonucleotidesignaturesforrapididentificationbysequencing
AT visagiecobusm clusteroligonucleotidesignaturesforrapididentificationbysequencing
AT levesquecandre clusteroligonucleotidesignaturesforrapididentificationbysequencing