Cargando…

ImaGene: a convolutional neural network to quantify natural selection from genomic data

BACKGROUND: The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework...

Descripción completa

Detalles Bibliográficos
Autores principales: Torada, Luis, Lorenzon, Lucrezia, Beddis, Alice, Isildak, Ulas, Pattini, Linda, Mathieson, Sara, Fumagalli, Matteo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6873651/
https://www.ncbi.nlm.nih.gov/pubmed/31757205
http://dx.doi.org/10.1186/s12859-019-2927-x
_version_ 1783472707655958528
author Torada, Luis
Lorenzon, Lucrezia
Beddis, Alice
Isildak, Ulas
Pattini, Linda
Mathieson, Sara
Fumagalli, Matteo
author_facet Torada, Luis
Lorenzon, Lucrezia
Beddis, Alice
Isildak, Ulas
Pattini, Linda
Mathieson, Sara
Fumagalli, Matteo
author_sort Torada, Luis
collection PubMed
description BACKGROUND: The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS: ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS: While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2927-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6873651
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68736512019-11-25 ImaGene: a convolutional neural network to quantify natural selection from genomic data Torada, Luis Lorenzon, Lucrezia Beddis, Alice Isildak, Ulas Pattini, Linda Mathieson, Sara Fumagalli, Matteo BMC Bioinformatics Software BACKGROUND: The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS: ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS: While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2927-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-11-22 /pmc/articles/PMC6873651/ /pubmed/31757205 http://dx.doi.org/10.1186/s12859-019-2927-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Torada, Luis
Lorenzon, Lucrezia
Beddis, Alice
Isildak, Ulas
Pattini, Linda
Mathieson, Sara
Fumagalli, Matteo
ImaGene: a convolutional neural network to quantify natural selection from genomic data
title ImaGene: a convolutional neural network to quantify natural selection from genomic data
title_full ImaGene: a convolutional neural network to quantify natural selection from genomic data
title_fullStr ImaGene: a convolutional neural network to quantify natural selection from genomic data
title_full_unstemmed ImaGene: a convolutional neural network to quantify natural selection from genomic data
title_short ImaGene: a convolutional neural network to quantify natural selection from genomic data
title_sort imagene: a convolutional neural network to quantify natural selection from genomic data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6873651/
https://www.ncbi.nlm.nih.gov/pubmed/31757205
http://dx.doi.org/10.1186/s12859-019-2927-x
work_keys_str_mv AT toradaluis imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata
AT lorenzonlucrezia imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata
AT beddisalice imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata
AT isildakulas imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata
AT pattinilinda imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata
AT mathiesonsara imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata
AT fumagallimatteo imageneaconvolutionalneuralnetworktoquantifynaturalselectionfromgenomicdata