Cargando…
Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals
BACKGROUND: Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome s...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989351/ https://www.ncbi.nlm.nih.gov/pubmed/27534529 http://dx.doi.org/10.1186/s12711-016-0237-6 |
_version_ | 1782448554628874240 |
---|---|
author | Taylor, Jeremy F. Whitacre, Lynsey K. Hoff, Jesse L. Tizioto, Polyana C. Kim, JaeWoo Decker, Jared E. Schnabel, Robert D. |
author_facet | Taylor, Jeremy F. Whitacre, Lynsey K. Hoff, Jesse L. Tizioto, Polyana C. Kim, JaeWoo Decker, Jared E. Schnabel, Robert D. |
author_sort | Taylor, Jeremy F. |
collection | PubMed |
description | BACKGROUND: Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. METHODS: We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. RESULTS: We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual’s genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. CONCLUSIONS: Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-016-0237-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4989351 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49893512016-08-19 Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals Taylor, Jeremy F. Whitacre, Lynsey K. Hoff, Jesse L. Tizioto, Polyana C. Kim, JaeWoo Decker, Jared E. Schnabel, Robert D. Genet Sel Evol Research Article BACKGROUND: Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. METHODS: We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. RESULTS: We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual’s genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. CONCLUSIONS: Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-016-0237-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-17 /pmc/articles/PMC4989351/ /pubmed/27534529 http://dx.doi.org/10.1186/s12711-016-0237-6 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Taylor, Jeremy F. Whitacre, Lynsey K. Hoff, Jesse L. Tizioto, Polyana C. Kim, JaeWoo Decker, Jared E. Schnabel, Robert D. Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
title | Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
title_full | Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
title_fullStr | Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
title_full_unstemmed | Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
title_short | Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
title_sort | lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989351/ https://www.ncbi.nlm.nih.gov/pubmed/27534529 http://dx.doi.org/10.1186/s12711-016-0237-6 |
work_keys_str_mv | AT taylorjeremyf lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals AT whitacrelynseyk lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals AT hoffjessel lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals AT tiziotopolyanac lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals AT kimjaewoo lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals AT deckerjarede lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals AT schnabelrobertd lessonsforlivestockgenomicsfromgenomeandtranscriptomesequencingincattleandothermammals |