Cargando…

Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing

Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency a...

Descripción completa

Detalles Bibliográficos
Autores principales: Chan, Chon-Kit Kenneth, Hsu, Arthur L., Tang, Sen-Lin, Halgamuge, Saman K.
Formato: Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2235928/
https://www.ncbi.nlm.nih.gov/pubmed/18288261
http://dx.doi.org/10.1155/2008/513701
_version_ 1782150413417447424
author Chan, Chon-Kit Kenneth
Hsu, Arthur L.
Tang, Sen-Lin
Halgamuge, Saman K.
author_facet Chan, Chon-Kit Kenneth
Hsu, Arthur L.
Tang, Sen-Lin
Halgamuge, Saman K.
author_sort Chan, Chon-Kit Kenneth
collection PubMed
description Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining [Formula: see text] speed improvement.
format Text
id pubmed-2235928
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-22359282008-02-20 Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing Chan, Chon-Kit Kenneth Hsu, Arthur L. Tang, Sen-Lin Halgamuge, Saman K. J Biomed Biotechnol Methodology Report Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining [Formula: see text] speed improvement. Hindawi Publishing Corporation 2008 2007-12-25 /pmc/articles/PMC2235928/ /pubmed/18288261 http://dx.doi.org/10.1155/2008/513701 Text en Copyright © 2008 Chon-Kit Kenneth Chan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Report
Chan, Chon-Kit Kenneth
Hsu, Arthur L.
Tang, Sen-Lin
Halgamuge, Saman K.
Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
title Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
title_full Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
title_fullStr Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
title_full_unstemmed Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
title_short Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
title_sort using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing
topic Methodology Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2235928/
https://www.ncbi.nlm.nih.gov/pubmed/18288261
http://dx.doi.org/10.1155/2008/513701
work_keys_str_mv AT chanchonkitkenneth usinggrowingselforganisingmapstoimprovethebinningprocessinenvironmentalwholegenomeshotgunsequencing
AT hsuarthurl usinggrowingselforganisingmapstoimprovethebinningprocessinenvironmentalwholegenomeshotgunsequencing
AT tangsenlin usinggrowingselforganisingmapstoimprovethebinningprocessinenvironmentalwholegenomeshotgunsequencing
AT halgamugesamank usinggrowingselforganisingmapstoimprovethebinningprocessinenvironmentalwholegenomeshotgunsequencing