Cargando…

HELLO: improved neural network architectures and methodologies for small variant calling

BACKGROUND: Modern Next Generation- and Third Generation- Sequencing methods such as Illumina and PacBio Circular Consensus Sequencing platforms provide accurate sequencing data. Parallel developments in Deep Learning have enabled the application of Deep Neural Networks to variant calling, surpassin...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramachandran, Anand, Lumetta, Steven S., Klee, Eric W., Chen, Deming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364080/
https://www.ncbi.nlm.nih.gov/pubmed/34391391
http://dx.doi.org/10.1186/s12859-021-04311-4
_version_ 1783738469923684352
author Ramachandran, Anand
Lumetta, Steven S.
Klee, Eric W.
Chen, Deming
author_facet Ramachandran, Anand
Lumetta, Steven S.
Klee, Eric W.
Chen, Deming
author_sort Ramachandran, Anand
collection PubMed
description BACKGROUND: Modern Next Generation- and Third Generation- Sequencing methods such as Illumina and PacBio Circular Consensus Sequencing platforms provide accurate sequencing data. Parallel developments in Deep Learning have enabled the application of Deep Neural Networks to variant calling, surpassing the accuracy of classical approaches in many settings. DeepVariant, arguably the most popular among such methods, transforms the problem of variant calling into one of image recognition where a Deep Neural Network analyzes sequencing data that is formatted as images, achieving high accuracy. In this paper, we explore an alternative approach to designing Deep Neural Networks for variant calling, where we use meticulously designed Deep Neural Network architectures and customized variant inference functions that account for the underlying nature of sequencing data instead of converting the problem to one of image recognition. RESULTS: Results from 27 whole-genome variant calling experiments spanning Illumina, PacBio and hybrid Illumina-PacBio settings suggest that our method allows vastly smaller Deep Neural Networks to outperform the Inception-v3 architecture used in DeepVariant for indel and substitution-type variant calls. For example, our method reduces the number of indel call errors by up to 18%, 55% and 65% for Illumina, PacBio and hybrid Illumina-PacBio variant calling respectively, compared to a similarly trained DeepVariant pipeline. In these cases, our models are between 7 and 14 times smaller. CONCLUSIONS: We believe that the improved accuracy and problem-specific customization of our models will enable more accurate pipelines and further method development in the field. HELLO is available at https://github.com/anands-repo/hello SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04311-4.
format Online
Article
Text
id pubmed-8364080
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83640802021-08-17 HELLO: improved neural network architectures and methodologies for small variant calling Ramachandran, Anand Lumetta, Steven S. Klee, Eric W. Chen, Deming BMC Bioinformatics Methodology Article BACKGROUND: Modern Next Generation- and Third Generation- Sequencing methods such as Illumina and PacBio Circular Consensus Sequencing platforms provide accurate sequencing data. Parallel developments in Deep Learning have enabled the application of Deep Neural Networks to variant calling, surpassing the accuracy of classical approaches in many settings. DeepVariant, arguably the most popular among such methods, transforms the problem of variant calling into one of image recognition where a Deep Neural Network analyzes sequencing data that is formatted as images, achieving high accuracy. In this paper, we explore an alternative approach to designing Deep Neural Networks for variant calling, where we use meticulously designed Deep Neural Network architectures and customized variant inference functions that account for the underlying nature of sequencing data instead of converting the problem to one of image recognition. RESULTS: Results from 27 whole-genome variant calling experiments spanning Illumina, PacBio and hybrid Illumina-PacBio settings suggest that our method allows vastly smaller Deep Neural Networks to outperform the Inception-v3 architecture used in DeepVariant for indel and substitution-type variant calls. For example, our method reduces the number of indel call errors by up to 18%, 55% and 65% for Illumina, PacBio and hybrid Illumina-PacBio variant calling respectively, compared to a similarly trained DeepVariant pipeline. In these cases, our models are between 7 and 14 times smaller. CONCLUSIONS: We believe that the improved accuracy and problem-specific customization of our models will enable more accurate pipelines and further method development in the field. HELLO is available at https://github.com/anands-repo/hello SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04311-4. BioMed Central 2021-08-14 /pmc/articles/PMC8364080/ /pubmed/34391391 http://dx.doi.org/10.1186/s12859-021-04311-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Ramachandran, Anand
Lumetta, Steven S.
Klee, Eric W.
Chen, Deming
HELLO: improved neural network architectures and methodologies for small variant calling
title HELLO: improved neural network architectures and methodologies for small variant calling
title_full HELLO: improved neural network architectures and methodologies for small variant calling
title_fullStr HELLO: improved neural network architectures and methodologies for small variant calling
title_full_unstemmed HELLO: improved neural network architectures and methodologies for small variant calling
title_short HELLO: improved neural network architectures and methodologies for small variant calling
title_sort hello: improved neural network architectures and methodologies for small variant calling
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364080/
https://www.ncbi.nlm.nih.gov/pubmed/34391391
http://dx.doi.org/10.1186/s12859-021-04311-4
work_keys_str_mv AT ramachandrananand helloimprovedneuralnetworkarchitecturesandmethodologiesforsmallvariantcalling
AT lumettastevens helloimprovedneuralnetworkarchitecturesandmethodologiesforsmallvariantcalling
AT kleeericw helloimprovedneuralnetworkarchitecturesandmethodologiesforsmallvariantcalling
AT chendeming helloimprovedneuralnetworkarchitecturesandmethodologiesforsmallvariantcalling