Cargando…

Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals

BACKGROUND: Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome s...

Descripción completa

Detalles Bibliográficos
Autores principales: Taylor, Jeremy F., Whitacre, Lynsey K., Hoff, Jesse L., Tizioto, Polyana C., Kim, JaeWoo, Decker, Jared E., Schnabel, Robert D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989351/
https://www.ncbi.nlm.nih.gov/pubmed/27534529
http://dx.doi.org/10.1186/s12711-016-0237-6
_version_ 1782448554628874240
author Taylor, Jeremy F.
Whitacre, Lynsey K.
Hoff, Jesse L.
Tizioto, Polyana C.
Kim, JaeWoo
Decker, Jared E.
Schnabel, Robert D.
author_facet Taylor, Jeremy F.
Whitacre, Lynsey K.
Hoff, Jesse L.
Tizioto, Polyana C.
Kim, JaeWoo
Decker, Jared E.
Schnabel, Robert D.
author_sort Taylor, Jeremy F.
collection PubMed
description BACKGROUND: Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. METHODS: We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. RESULTS: We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual’s genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. CONCLUSIONS: Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-016-0237-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4989351
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49893512016-08-19 Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals Taylor, Jeremy F. Whitacre, Lynsey K. Hoff, Jesse L. Tizioto, Polyana C. Kim, JaeWoo Decker, Jared E. Schnabel, Robert D. Genet Sel Evol Research Article BACKGROUND: Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. METHODS: We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. RESULTS: We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual’s genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. CONCLUSIONS: Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-016-0237-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-17 /pmc/articles/PMC4989351/ /pubmed/27534529 http://dx.doi.org/10.1186/s12711-016-0237-6 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Taylor, Jeremy F.
Whitacre, Lynsey K.
Hoff, Jesse L.
Tizioto, Polyana C.
Kim, JaeWoo
Decker, Jared E.
Schnabel, Robert D.
Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
title Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
title_full Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
title_fullStr Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
title_full_unstemmed Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
title_short Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
title_sort lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989351/
https://www.ncbi.nlm.nih.gov/pubmed/27534529
http://dx.doi.org/10.1186/s12711-016-0237-6
work_keys_str_mv AT taylorjeremyf lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals
AT whitacrelynseyk lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals
AT hoffjessel lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals
AT tiziotopolyanac lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals
AT kimjaewoo lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals
AT deckerjarede lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals
AT schnabelrobertd lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals