Cargando…

Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem

High-throughput DNA sequencing is becoming an increasingly important tool to monitor and better understand biodiversity responses to environmental changes in a standardized and reproducible way. Environmental DNA (eDNA) from organisms can be captured in ecosystem samples and sequenced using metabarc...

Descripción completa

Detalles Bibliográficos
Autores principales: Flück, Benjamin, Mathon, Laëtitia, Manel, Stéphanie, Valentini, Alice, Dejean, Tony, Albouy, Camille, Mouillot, David, Thuiller, Wilfried, Murienne, Jérôme, Brosse, Sébastien, Pellissier, Loïc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9205931/
https://www.ncbi.nlm.nih.gov/pubmed/35715444
http://dx.doi.org/10.1038/s41598-022-13412-w
_version_ 1784729233641177088
author Flück, Benjamin
Mathon, Laëtitia
Manel, Stéphanie
Valentini, Alice
Dejean, Tony
Albouy, Camille
Mouillot, David
Thuiller, Wilfried
Murienne, Jérôme
Brosse, Sébastien
Pellissier, Loïc
author_facet Flück, Benjamin
Mathon, Laëtitia
Manel, Stéphanie
Valentini, Alice
Dejean, Tony
Albouy, Camille
Mouillot, David
Thuiller, Wilfried
Murienne, Jérôme
Brosse, Sébastien
Pellissier, Loïc
author_sort Flück, Benjamin
collection PubMed
description High-throughput DNA sequencing is becoming an increasingly important tool to monitor and better understand biodiversity responses to environmental changes in a standardized and reproducible way. Environmental DNA (eDNA) from organisms can be captured in ecosystem samples and sequenced using metabarcoding, but processing large volumes of eDNA data and annotating sequences to recognized taxa remains computationally expensive. Speed and accuracy are two major bottlenecks in this critical step. Here, we evaluated the ability of convolutional neural networks (CNNs) to process short eDNA sequences and associate them with taxonomic labels. Using a unique eDNA data set collected in highly diverse Tropical South America, we compared the speed and accuracy of CNNs with that of a well-known bioinformatic pipeline (OBITools) in processing a small region (60 bp) of the 12S ribosomal DNA targeting freshwater fishes. We found that the taxonomic labels from the CNNs were comparable to those from OBITools, with high correlation levels for the composition of the regional fish fauna. The CNNs enabled the processing of raw fastq files at a rate of approximately 1 million sequences per minute, which was about 150 times faster than with OBITools. Given the good performance of CNNs in the highly diverse ecosystem considered here, the development of more elaborate CNNs promises fast deployment for future biodiversity inventories using eDNA.
format Online
Article
Text
id pubmed-9205931
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92059312022-06-19 Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem Flück, Benjamin Mathon, Laëtitia Manel, Stéphanie Valentini, Alice Dejean, Tony Albouy, Camille Mouillot, David Thuiller, Wilfried Murienne, Jérôme Brosse, Sébastien Pellissier, Loïc Sci Rep Article High-throughput DNA sequencing is becoming an increasingly important tool to monitor and better understand biodiversity responses to environmental changes in a standardized and reproducible way. Environmental DNA (eDNA) from organisms can be captured in ecosystem samples and sequenced using metabarcoding, but processing large volumes of eDNA data and annotating sequences to recognized taxa remains computationally expensive. Speed and accuracy are two major bottlenecks in this critical step. Here, we evaluated the ability of convolutional neural networks (CNNs) to process short eDNA sequences and associate them with taxonomic labels. Using a unique eDNA data set collected in highly diverse Tropical South America, we compared the speed and accuracy of CNNs with that of a well-known bioinformatic pipeline (OBITools) in processing a small region (60 bp) of the 12S ribosomal DNA targeting freshwater fishes. We found that the taxonomic labels from the CNNs were comparable to those from OBITools, with high correlation levels for the composition of the regional fish fauna. The CNNs enabled the processing of raw fastq files at a rate of approximately 1 million sequences per minute, which was about 150 times faster than with OBITools. Given the good performance of CNNs in the highly diverse ecosystem considered here, the development of more elaborate CNNs promises fast deployment for future biodiversity inventories using eDNA. Nature Publishing Group UK 2022-06-17 /pmc/articles/PMC9205931/ /pubmed/35715444 http://dx.doi.org/10.1038/s41598-022-13412-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Flück, Benjamin
Mathon, Laëtitia
Manel, Stéphanie
Valentini, Alice
Dejean, Tony
Albouy, Camille
Mouillot, David
Thuiller, Wilfried
Murienne, Jérôme
Brosse, Sébastien
Pellissier, Loïc
Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
title Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
title_full Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
title_fullStr Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
title_full_unstemmed Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
title_short Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem
title_sort applying convolutional neural networks to speed up environmental dna annotation in a highly diverse ecosystem
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9205931/
https://www.ncbi.nlm.nih.gov/pubmed/35715444
http://dx.doi.org/10.1038/s41598-022-13412-w
work_keys_str_mv AT fluckbenjamin applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT mathonlaetitia applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT manelstephanie applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT valentinialice applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT dejeantony applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT albouycamille applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT mouillotdavid applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT thuillerwilfried applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT muriennejerome applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT brossesebastien applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem
AT pellissierloic applyingconvolutionalneuralnetworkstospeedupenvironmentaldnaannotationinahighlydiverseecosystem