Cargando…
CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning
The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for d...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7255349/ https://www.ncbi.nlm.nih.gov/pubmed/32454212 http://dx.doi.org/10.1016/j.ymeth.2020.05.018 |
_version_ | 1783539720204058624 |
---|---|
author | Shang, Jiayu Sun, Yanni |
author_facet | Shang, Jiayu Sun, Yanni |
author_sort | Shang, Jiayu |
collection | PubMed |
description | The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER. |
format | Online Article Text |
id | pubmed-7255349 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-72553492020-05-28 CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning Shang, Jiayu Sun, Yanni Methods Article The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER. Elsevier Inc. 2020-05-23 /pmc/articles/PMC7255349/ /pubmed/32454212 http://dx.doi.org/10.1016/j.ymeth.2020.05.018 Text en © 2020 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Shang, Jiayu Sun, Yanni CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning |
title | CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning |
title_full | CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning |
title_fullStr | CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning |
title_full_unstemmed | CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning |
title_short | CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning |
title_sort | cheer: hierarchical taxonomic classification for viral metagenomic data via deep learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7255349/ https://www.ncbi.nlm.nih.gov/pubmed/32454212 http://dx.doi.org/10.1016/j.ymeth.2020.05.018 |
work_keys_str_mv | AT shangjiayu cheerhierarchicaltaxonomicclassificationforviralmetagenomicdataviadeeplearning AT sunyanni cheerhierarchicaltaxonomicclassificationforviralmetagenomicdataviadeeplearning |