Cargando…

A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS

Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost...

Descripción completa

Detalles Bibliográficos
Autores principales: Player, Robert A, Forsyth, Ellen R, Verratti, Kathleen J, Mohr, David W, Scott, Alan F, Bradburne, Christopher E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Life Science Alliance LLC 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7898556/
https://www.ncbi.nlm.nih.gov/pubmed/33514656
http://dx.doi.org/10.26508/lsa.202000902
_version_ 1783653885398745088
author Player, Robert A
Forsyth, Ellen R
Verratti, Kathleen J
Mohr, David W
Scott, Alan F
Bradburne, Christopher E
author_facet Player, Robert A
Forsyth, Ellen R
Verratti, Kathleen J
Mohr, David W
Scott, Alan F
Bradburne, Christopher E
author_sort Player, Robert A
collection PubMed
description Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.
format Online
Article
Text
id pubmed-7898556
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Life Science Alliance LLC
record_format MEDLINE/PubMed
spelling pubmed-78985562021-03-23 A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS Player, Robert A Forsyth, Ellen R Verratti, Kathleen J Mohr, David W Scott, Alan F Bradburne, Christopher E Life Sci Alliance Research Articles Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study. Life Science Alliance LLC 2021-01-29 /pmc/articles/PMC7898556/ /pubmed/33514656 http://dx.doi.org/10.26508/lsa.202000902 Text en © 2021 Player et al. https://creativecommons.org/licenses/by/4.0/This article is available under a Creative Commons License (Attribution 4.0 International, as described at https://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Articles
Player, Robert A
Forsyth, Ellen R
Verratti, Kathleen J
Mohr, David W
Scott, Alan F
Bradburne, Christopher E
A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
title A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
title_full A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
title_fullStr A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
title_full_unstemmed A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
title_short A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
title_sort novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific gwas
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7898556/
https://www.ncbi.nlm.nih.gov/pubmed/33514656
http://dx.doi.org/10.26508/lsa.202000902
work_keys_str_mv AT playerroberta anovelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT forsythellenr anovelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT verrattikathleenj anovelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT mohrdavidw anovelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT scottalanf anovelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT bradburnechristophere anovelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT playerroberta novelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT forsythellenr novelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT verrattikathleenj novelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT mohrdavidw novelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT scottalanf novelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas
AT bradburnechristophere novelcanislupusfamiliarisreferencegenomeimprovesvariantresolutionforuseinbreedspecificgwas