Cargando…

Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these...

Descripción completa

Detalles Bibliográficos
Autores principales: Kalbfleisch, Ted, Heaton, Michael P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103496/
https://www.ncbi.nlm.nih.gov/pubmed/25075278
http://dx.doi.org/10.12688/f1000research.2-244.v2
_version_ 1782327155733037056
author Kalbfleisch, Ted
Heaton, Michael P
author_facet Kalbfleisch, Ted
Heaton, Michael P
author_sort Kalbfleisch, Ted
collection PubMed
description Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species’ reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function.
format Online
Article
Text
id pubmed-4103496
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-41034962014-07-28 Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes Kalbfleisch, Ted Heaton, Michael P F1000Res Research Article Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species’ reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function. F1000Research 2014-02-10 /pmc/articles/PMC4103496/ /pubmed/25075278 http://dx.doi.org/10.12688/f1000research.2-244.v2 Text en Copyright: © 2014 Kalbfleisch T and Heaton MP http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/publicdomain/zero/1.0/ Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
spellingShingle Research Article
Kalbfleisch, Ted
Heaton, Michael P
Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
title Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
title_full Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
title_fullStr Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
title_full_unstemmed Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
title_short Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
title_sort mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103496/
https://www.ncbi.nlm.nih.gov/pubmed/25075278
http://dx.doi.org/10.12688/f1000research.2-244.v2
work_keys_str_mv AT kalbfleischted mappingwholegenomeshotgunsequenceandvariantcallinginmammalianspecieswithouttheirreferencegenomes
AT heatonmichaelp mappingwholegenomeshotgunsequenceandvariantcallinginmammalianspecieswithouttheirreferencegenomes