Cargando…

Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies

BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (S...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Giltae, Lee, Jongin, Kim, Juyeon, Kang, Seokwoo, Lee, Hoyong, Kwon, Daehong, Lee, Daehwan, Lang, Gregory I., Cherry, J. Michael, Kim, Jaebum
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6711525/
https://www.ncbi.nlm.nih.gov/pubmed/31454399
http://dx.doi.org/10.1371/journal.pone.0221858
_version_ 1783446530754084864
author Song, Giltae
Lee, Jongin
Kim, Juyeon
Kang, Seokwoo
Lee, Hoyong
Kwon, Daehong
Lee, Daehwan
Lang, Gregory I.
Cherry, J. Michael
Kim, Jaebum
author_facet Song, Giltae
Lee, Jongin
Kim, Juyeon
Kang, Seokwoo
Lee, Hoyong
Kwon, Daehong
Lee, Daehwan
Lang, Gregory I.
Cherry, J. Michael
Kim, Jaebum
author_sort Song, Giltae
collection PubMed
description BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. METHODS AND RESULTS: In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. CONCLUSIONS: We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest.
format Online
Article
Text
id pubmed-6711525
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67115252019-09-10 Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies Song, Giltae Lee, Jongin Kim, Juyeon Kang, Seokwoo Lee, Hoyong Kwon, Daehong Lee, Daehwan Lang, Gregory I. Cherry, J. Michael Kim, Jaebum PLoS One Research Article BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. METHODS AND RESULTS: In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. CONCLUSIONS: We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest. Public Library of Science 2019-08-27 /pmc/articles/PMC6711525/ /pubmed/31454399 http://dx.doi.org/10.1371/journal.pone.0221858 Text en © 2019 Song et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Song, Giltae
Lee, Jongin
Kim, Juyeon
Kang, Seokwoo
Lee, Hoyong
Kwon, Daehong
Lee, Daehwan
Lang, Gregory I.
Cherry, J. Michael
Kim, Jaebum
Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
title Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
title_full Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
title_fullStr Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
title_full_unstemmed Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
title_short Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
title_sort integrative meta-assembly pipeline (imap): chromosome-level genome assembler combining multiple de novo assemblies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6711525/
https://www.ncbi.nlm.nih.gov/pubmed/31454399
http://dx.doi.org/10.1371/journal.pone.0221858
work_keys_str_mv AT songgiltae integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT leejongin integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT kimjuyeon integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT kangseokwoo integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT leehoyong integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT kwondaehong integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT leedaehwan integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT langgregoryi integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT cherryjmichael integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies
AT kimjaebum integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies