Cargando…

Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs

Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single pureb...

Descripción completa

Detalles Bibliográficos
Autores principales: Holden, Lindsay A., Arumilli, Meharji, Hytönen, Marjo K., Hundi, Sruthi, Salojärvi, Jarkko, Brown, Kim H., Lohi, Hannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052005/
https://www.ncbi.nlm.nih.gov/pubmed/30022108
http://dx.doi.org/10.1038/s41598-018-29190-3
_version_ 1783340587311693824
author Holden, Lindsay A.
Arumilli, Meharji
Hytönen, Marjo K.
Hundi, Sruthi
Salojärvi, Jarkko
Brown, Kim H.
Lohi, Hannes
author_facet Holden, Lindsay A.
Arumilli, Meharji
Hytönen, Marjo K.
Hundi, Sruthi
Salojärvi, Jarkko
Brown, Kim H.
Lohi, Hannes
author_sort Holden, Lindsay A.
collection PubMed
description Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.
format Online
Article
Text
id pubmed-6052005
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-60520052018-07-23 Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs Holden, Lindsay A. Arumilli, Meharji Hytönen, Marjo K. Hundi, Sruthi Salojärvi, Jarkko Brown, Kim H. Lohi, Hannes Sci Rep Article Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes. Nature Publishing Group UK 2018-07-18 /pmc/articles/PMC6052005/ /pubmed/30022108 http://dx.doi.org/10.1038/s41598-018-29190-3 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Holden, Lindsay A.
Arumilli, Meharji
Hytönen, Marjo K.
Hundi, Sruthi
Salojärvi, Jarkko
Brown, Kim H.
Lohi, Hannes
Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
title Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
title_full Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
title_fullStr Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
title_full_unstemmed Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
title_short Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
title_sort assembly and analysis of unmapped genome sequence reads reveal novel sequence and variation in dogs
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052005/
https://www.ncbi.nlm.nih.gov/pubmed/30022108
http://dx.doi.org/10.1038/s41598-018-29190-3
work_keys_str_mv AT holdenlindsaya assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs
AT arumillimeharji assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs
AT hytonenmarjok assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs
AT hundisruthi assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs
AT salojarvijarkko assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs
AT brownkimh assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs
AT lohihannes assemblyandanalysisofunmappedgenomesequencereadsrevealnovelsequenceandvariationindogs