Cargando…

Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data

A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline...

Descripción completa

Detalles Bibliográficos
Autores principales: Ikegami, Tsutomu, Inatsugi, Toyohiro, Kojima, Isao, Umemura, Myco, Hagiwara, Hiroko, Machida, Masayuki, Asai, Kiyoshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412624/
https://www.ncbi.nlm.nih.gov/pubmed/25919614
http://dx.doi.org/10.1371/journal.pone.0126289
_version_ 1782368693002436608
author Ikegami, Tsutomu
Inatsugi, Toyohiro
Kojima, Isao
Umemura, Myco
Hagiwara, Hiroko
Machida, Masayuki
Asai, Kiyoshi
author_facet Ikegami, Tsutomu
Inatsugi, Toyohiro
Kojima, Isao
Umemura, Myco
Hagiwara, Hiroko
Machida, Masayuki
Asai, Kiyoshi
author_sort Ikegami, Tsutomu
collection PubMed
description A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.
format Online
Article
Text
id pubmed-4412624
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44126242015-05-12 Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data Ikegami, Tsutomu Inatsugi, Toyohiro Kojima, Isao Umemura, Myco Hagiwara, Hiroko Machida, Masayuki Asai, Kiyoshi PLoS One Research Article A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data. Public Library of Science 2015-04-28 /pmc/articles/PMC4412624/ /pubmed/25919614 http://dx.doi.org/10.1371/journal.pone.0126289 Text en © 2015 Ikegami et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ikegami, Tsutomu
Inatsugi, Toyohiro
Kojima, Isao
Umemura, Myco
Hagiwara, Hiroko
Machida, Masayuki
Asai, Kiyoshi
Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
title Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
title_full Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
title_fullStr Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
title_full_unstemmed Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
title_short Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
title_sort hybrid de novo genome assembly using miseq and solid short read data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412624/
https://www.ncbi.nlm.nih.gov/pubmed/25919614
http://dx.doi.org/10.1371/journal.pone.0126289
work_keys_str_mv AT ikegamitsutomu hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata
AT inatsugitoyohiro hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata
AT kojimaisao hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata
AT umemuramyco hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata
AT hagiwarahiroko hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata
AT machidamasayuki hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata
AT asaikiyoshi hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata