Cargando…

Bacteriophage classification for assembled contigs using graph convolutional network

MOTIVATION: Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high-throughput...

Descripción completa

Detalles Bibliográficos
Autores principales: Shang, Jiayu, Jiang, Jingzhe, Sun, Yanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275337/
https://www.ncbi.nlm.nih.gov/pubmed/34252923
http://dx.doi.org/10.1093/bioinformatics/btab293
_version_ 1783721693048471552
author Shang, Jiayu
Jiang, Jingzhe
Sun, Yanni
author_facet Shang, Jiayu
Jiang, Jingzhe
Sun, Yanni
author_sort Shang, Jiayu
collection PubMed
description MOTIVATION: Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high-throughput sequencing, particularly metagenomic sequencing. Compared to the fast accumulation of phage-like sequences, there is a serious lag in taxonomic classification of phages. High diversity, abundance and limited known phages pose great challenges for taxonomic analysis. In particular, alignment-based tools have difficulty in classifying fast accumulating contigs assembled from metagenomic data. RESULTS: In this work, we present a novel semi-supervised learning model, named PhaGCN, to conduct taxonomic classification for phage contigs. In this learning model, we construct a knowledge graph by combining the DNA sequence features learned by convolutional neural network and protein sequence similarity gained from gene-sharing network. Then we apply graph convolutional network to utilize both the labeled and unlabeled samples in training to enhance the learning ability. We tested PhaGCN on both simulated and real sequencing data. The results clearly show that our method competes favorably against available phage classification tools. AVAILABILITY AND IMPLEMENTATION: The source code of PhaGCN is available via: https://github.com/KennthShang/PhaGCN.
format Online
Article
Text
id pubmed-8275337
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82753372021-07-13 Bacteriophage classification for assembled contigs using graph convolutional network Shang, Jiayu Jiang, Jingzhe Sun, Yanni Bioinformatics Bioinformatics of Microbes and Microbiomes MOTIVATION: Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high-throughput sequencing, particularly metagenomic sequencing. Compared to the fast accumulation of phage-like sequences, there is a serious lag in taxonomic classification of phages. High diversity, abundance and limited known phages pose great challenges for taxonomic analysis. In particular, alignment-based tools have difficulty in classifying fast accumulating contigs assembled from metagenomic data. RESULTS: In this work, we present a novel semi-supervised learning model, named PhaGCN, to conduct taxonomic classification for phage contigs. In this learning model, we construct a knowledge graph by combining the DNA sequence features learned by convolutional neural network and protein sequence similarity gained from gene-sharing network. Then we apply graph convolutional network to utilize both the labeled and unlabeled samples in training to enhance the learning ability. We tested PhaGCN on both simulated and real sequencing data. The results clearly show that our method competes favorably against available phage classification tools. AVAILABILITY AND IMPLEMENTATION: The source code of PhaGCN is available via: https://github.com/KennthShang/PhaGCN. Oxford University Press 2021-07-12 /pmc/articles/PMC8275337/ /pubmed/34252923 http://dx.doi.org/10.1093/bioinformatics/btab293 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Bioinformatics of Microbes and Microbiomes
Shang, Jiayu
Jiang, Jingzhe
Sun, Yanni
Bacteriophage classification for assembled contigs using graph convolutional network
title Bacteriophage classification for assembled contigs using graph convolutional network
title_full Bacteriophage classification for assembled contigs using graph convolutional network
title_fullStr Bacteriophage classification for assembled contigs using graph convolutional network
title_full_unstemmed Bacteriophage classification for assembled contigs using graph convolutional network
title_short Bacteriophage classification for assembled contigs using graph convolutional network
title_sort bacteriophage classification for assembled contigs using graph convolutional network
topic Bioinformatics of Microbes and Microbiomes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275337/
https://www.ncbi.nlm.nih.gov/pubmed/34252923
http://dx.doi.org/10.1093/bioinformatics/btab293
work_keys_str_mv AT shangjiayu bacteriophageclassificationforassembledcontigsusinggraphconvolutionalnetwork
AT jiangjingzhe bacteriophageclassificationforassembledcontigsusinggraphconvolutionalnetwork
AT sunyanni bacteriophageclassificationforassembledcontigsusinggraphconvolutionalnetwork