Cargando…
Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies
BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (S...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6711525/ https://www.ncbi.nlm.nih.gov/pubmed/31454399 http://dx.doi.org/10.1371/journal.pone.0221858 |
_version_ | 1783446530754084864 |
---|---|
author | Song, Giltae Lee, Jongin Kim, Juyeon Kang, Seokwoo Lee, Hoyong Kwon, Daehong Lee, Daehwan Lang, Gregory I. Cherry, J. Michael Kim, Jaebum |
author_facet | Song, Giltae Lee, Jongin Kim, Juyeon Kang, Seokwoo Lee, Hoyong Kwon, Daehong Lee, Daehwan Lang, Gregory I. Cherry, J. Michael Kim, Jaebum |
author_sort | Song, Giltae |
collection | PubMed |
description | BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. METHODS AND RESULTS: In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. CONCLUSIONS: We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest. |
format | Online Article Text |
id | pubmed-6711525 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-67115252019-09-10 Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies Song, Giltae Lee, Jongin Kim, Juyeon Kang, Seokwoo Lee, Hoyong Kwon, Daehong Lee, Daehwan Lang, Gregory I. Cherry, J. Michael Kim, Jaebum PLoS One Research Article BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. METHODS AND RESULTS: In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. CONCLUSIONS: We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest. Public Library of Science 2019-08-27 /pmc/articles/PMC6711525/ /pubmed/31454399 http://dx.doi.org/10.1371/journal.pone.0221858 Text en © 2019 Song et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Song, Giltae Lee, Jongin Kim, Juyeon Kang, Seokwoo Lee, Hoyong Kwon, Daehong Lee, Daehwan Lang, Gregory I. Cherry, J. Michael Kim, Jaebum Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies |
title | Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies |
title_full | Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies |
title_fullStr | Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies |
title_full_unstemmed | Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies |
title_short | Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies |
title_sort | integrative meta-assembly pipeline (imap): chromosome-level genome assembler combining multiple de novo assemblies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6711525/ https://www.ncbi.nlm.nih.gov/pubmed/31454399 http://dx.doi.org/10.1371/journal.pone.0221858 |
work_keys_str_mv | AT songgiltae integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT leejongin integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT kimjuyeon integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT kangseokwoo integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT leehoyong integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT kwondaehong integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT leedaehwan integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT langgregoryi integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT cherryjmichael integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies AT kimjaebum integrativemetaassemblypipelineimapchromosomelevelgenomeassemblercombiningmultipledenovoassemblies |