Cargando…

Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses

Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here f...

Descripción completa

Detalles Bibliográficos
Autores principales: Bronski, Michael J., Martinez, Ciera C., Weld, Holli A., Eisen, Michael B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7202002/
https://www.ncbi.nlm.nih.gov/pubmed/32220952
http://dx.doi.org/10.1534/g3.119.400959
_version_ 1783529652633993216
author Bronski, Michael J.
Martinez, Ciera C.
Weld, Holli A.
Eisen, Michael B.
author_facet Bronski, Michael J.
Martinez, Ciera C.
Weld, Holli A.
Eisen, Michael B.
author_sort Bronski, Michael J.
collection PubMed
description Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5–15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.
format Online
Article
Text
id pubmed-7202002
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-72020022020-05-09 Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses Bronski, Michael J. Martinez, Ciera C. Weld, Holli A. Eisen, Michael B. G3 (Bethesda) Genome Report Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5–15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations. Genetics Society of America 2020-03-27 /pmc/articles/PMC7202002/ /pubmed/32220952 http://dx.doi.org/10.1534/g3.119.400959 Text en Copyright © 2020 Bronski et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genome Report
Bronski, Michael J.
Martinez, Ciera C.
Weld, Holli A.
Eisen, Michael B.
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
title Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
title_full Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
title_fullStr Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
title_full_unstemmed Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
title_short Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
title_sort whole genome sequences of 23 species from the drosophila montium species group (diptera: drosophilidae): a resource for testing evolutionary hypotheses
topic Genome Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7202002/
https://www.ncbi.nlm.nih.gov/pubmed/32220952
http://dx.doi.org/10.1534/g3.119.400959
work_keys_str_mv AT bronskimichaelj wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses
AT martinezcierac wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses
AT weldhollia wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses
AT eisenmichaelb wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses