Cargando…

Genome annotation across species using deep convolutional neural networks

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Khodabandelou, Ghazaleh, Routhier, Etienne, Mozziconacci, Julien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924482/
https://www.ncbi.nlm.nih.gov/pubmed/33816929
http://dx.doi.org/10.7717/peerj-cs.278
_version_ 1783659099586560000
author Khodabandelou, Ghazaleh
Routhier, Etienne
Mozziconacci, Julien
author_facet Khodabandelou, Ghazaleh
Routhier, Etienne
Mozziconacci, Julien
author_sort Khodabandelou, Ghazaleh
collection PubMed
description Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.
format Online
Article
Text
id pubmed-7924482
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244822021-04-02 Genome annotation across species using deep convolutional neural networks Khodabandelou, Ghazaleh Routhier, Etienne Mozziconacci, Julien PeerJ Comput Sci Bioinformatics Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not. PeerJ Inc. 2020-06-15 /pmc/articles/PMC7924482/ /pubmed/33816929 http://dx.doi.org/10.7717/peerj-cs.278 Text en ©2020 Khodabandelou et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Khodabandelou, Ghazaleh
Routhier, Etienne
Mozziconacci, Julien
Genome annotation across species using deep convolutional neural networks
title Genome annotation across species using deep convolutional neural networks
title_full Genome annotation across species using deep convolutional neural networks
title_fullStr Genome annotation across species using deep convolutional neural networks
title_full_unstemmed Genome annotation across species using deep convolutional neural networks
title_short Genome annotation across species using deep convolutional neural networks
title_sort genome annotation across species using deep convolutional neural networks
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924482/
https://www.ncbi.nlm.nih.gov/pubmed/33816929
http://dx.doi.org/10.7717/peerj-cs.278
work_keys_str_mv AT khodabandeloughazaleh genomeannotationacrossspeciesusingdeepconvolutionalneuralnetworks
AT routhieretienne genomeannotationacrossspeciesusingdeepconvolutionalneuralnetworks
AT mozziconaccijulien genomeannotationacrossspeciesusingdeepconvolutionalneuralnetworks