Cargando…

A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data

Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications o...

Descripción completa

Detalles Bibliográficos
Autores principales: Hill, Tom, Unckless, Robert L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829143/
https://www.ncbi.nlm.nih.gov/pubmed/31455677
http://dx.doi.org/10.1534/g3.119.400596
_version_ 1783465486360510464
author Hill, Tom
Unckless, Robert L.
author_facet Hill, Tom
Unckless, Robert L.
author_sort Hill, Tom
collection PubMed
description Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods of coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.
format Online
Article
Text
id pubmed-6829143
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-68291432019-11-06 A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data Hill, Tom Unckless, Robert L. G3 (Bethesda) Software and Data Resources Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods of coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data. Genetics Society of America 2019-08-27 /pmc/articles/PMC6829143/ /pubmed/31455677 http://dx.doi.org/10.1534/g3.119.400596 Text en Copyright © 2019 Hill, Unckless http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software and Data Resources
Hill, Tom
Unckless, Robert L.
A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_full A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_fullStr A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_full_unstemmed A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_short A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_sort deep learning approach for detecting copy number variation in next-generation sequencing data
topic Software and Data Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829143/
https://www.ncbi.nlm.nih.gov/pubmed/31455677
http://dx.doi.org/10.1534/g3.119.400596
work_keys_str_mv AT hilltom adeeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
AT uncklessrobertl adeeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
AT hilltom deeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
AT uncklessrobertl deeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata