Cargando…

Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies

Illumina sequencing has revolutionized yeast genomics, with prices for commercial draft genome sequencing now below $200. The popular SPAdes assembler makes it simple to generate a de novo genome assembly for any yeast species. However, whereas making genome assemblies has become routine, understand...

Descripción completa

Detalles Bibliográficos
Autores principales: Douglass, Alexander P., O’Brien, Caoimhe E., Offei, Benjamin, Coughlan, Aisling Y., Ortiz-Merino, Raúl A., Butler, Geraldine, Byrne, Kevin P., Wolfe, Kenneth H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404606/
https://www.ncbi.nlm.nih.gov/pubmed/30674538
http://dx.doi.org/10.1534/g3.118.200745
_version_ 1783400919349592064
author Douglass, Alexander P.
O’Brien, Caoimhe E.
Offei, Benjamin
Coughlan, Aisling Y.
Ortiz-Merino, Raúl A.
Butler, Geraldine
Byrne, Kevin P.
Wolfe, Kenneth H.
author_facet Douglass, Alexander P.
O’Brien, Caoimhe E.
Offei, Benjamin
Coughlan, Aisling Y.
Ortiz-Merino, Raúl A.
Butler, Geraldine
Byrne, Kevin P.
Wolfe, Kenneth H.
author_sort Douglass, Alexander P.
collection PubMed
description Illumina sequencing has revolutionized yeast genomics, with prices for commercial draft genome sequencing now below $200. The popular SPAdes assembler makes it simple to generate a de novo genome assembly for any yeast species. However, whereas making genome assemblies has become routine, understanding what they contain is still challenging. Here, we show how graphing the information that SPAdes provides about the length and coverage of each scaffold can be used to investigate the nature of an assembly, and to diagnose possible problems. Scaffolds derived from mitochondrial DNA, ribosomal DNA, and yeast plasmids can be identified by their high coverage. Contaminating data, such as cross-contamination from other samples in a multiplex sequencing run, can be identified by its low coverage. Scaffolds derived from the bacteriophage PhiX174 and Lambda DNAs that are frequently used as molecular standards in Illumina protocols can also be detected. Assemblies of yeast genomes with high heterozygosity, such as interspecies hybrids, often contain two types of scaffold: regions of the genome where the two alleles assembled into two separate scaffolds and each has a coverage level C, and regions where the two alleles co-assembled (collapsed) into a single scaffold that has a coverage level 2C. Visualizing the data with Coverage-vs.-Length (CVL) plots, which can be done using Microsoft Excel or Google Sheets, provides a simple method to understand the structure of a genome assembly and detect aberrant scaffolds or contigs. We provide a Python script that allows assemblies to be filtered to remove contaminants identified in CVL plots.
format Online
Article
Text
id pubmed-6404606
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-64046062019-03-11 Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies Douglass, Alexander P. O’Brien, Caoimhe E. Offei, Benjamin Coughlan, Aisling Y. Ortiz-Merino, Raúl A. Butler, Geraldine Byrne, Kevin P. Wolfe, Kenneth H. G3 (Bethesda) Investigations Illumina sequencing has revolutionized yeast genomics, with prices for commercial draft genome sequencing now below $200. The popular SPAdes assembler makes it simple to generate a de novo genome assembly for any yeast species. However, whereas making genome assemblies has become routine, understanding what they contain is still challenging. Here, we show how graphing the information that SPAdes provides about the length and coverage of each scaffold can be used to investigate the nature of an assembly, and to diagnose possible problems. Scaffolds derived from mitochondrial DNA, ribosomal DNA, and yeast plasmids can be identified by their high coverage. Contaminating data, such as cross-contamination from other samples in a multiplex sequencing run, can be identified by its low coverage. Scaffolds derived from the bacteriophage PhiX174 and Lambda DNAs that are frequently used as molecular standards in Illumina protocols can also be detected. Assemblies of yeast genomes with high heterozygosity, such as interspecies hybrids, often contain two types of scaffold: regions of the genome where the two alleles assembled into two separate scaffolds and each has a coverage level C, and regions where the two alleles co-assembled (collapsed) into a single scaffold that has a coverage level 2C. Visualizing the data with Coverage-vs.-Length (CVL) plots, which can be done using Microsoft Excel or Google Sheets, provides a simple method to understand the structure of a genome assembly and detect aberrant scaffolds or contigs. We provide a Python script that allows assemblies to be filtered to remove contaminants identified in CVL plots. Genetics Society of America 2019-01-23 /pmc/articles/PMC6404606/ /pubmed/30674538 http://dx.doi.org/10.1534/g3.118.200745 Text en Copyright © 2019 Douglass et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Douglass, Alexander P.
O’Brien, Caoimhe E.
Offei, Benjamin
Coughlan, Aisling Y.
Ortiz-Merino, Raúl A.
Butler, Geraldine
Byrne, Kevin P.
Wolfe, Kenneth H.
Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
title Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
title_full Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
title_fullStr Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
title_full_unstemmed Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
title_short Coverage-Versus-Length Plots, a Simple Quality Control Step for de Novo Yeast Genome Sequence Assemblies
title_sort coverage-versus-length plots, a simple quality control step for de novo yeast genome sequence assemblies
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404606/
https://www.ncbi.nlm.nih.gov/pubmed/30674538
http://dx.doi.org/10.1534/g3.118.200745
work_keys_str_mv AT douglassalexanderp coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT obriencaoimhee coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT offeibenjamin coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT coughlanaislingy coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT ortizmerinoraula coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT butlergeraldine coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT byrnekevinp coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies
AT wolfekennethh coverageversuslengthplotsasimplequalitycontrolstepfordenovoyeastgenomesequenceassemblies