Cargando…
Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412624/ https://www.ncbi.nlm.nih.gov/pubmed/25919614 http://dx.doi.org/10.1371/journal.pone.0126289 |
_version_ | 1782368693002436608 |
---|---|
author | Ikegami, Tsutomu Inatsugi, Toyohiro Kojima, Isao Umemura, Myco Hagiwara, Hiroko Machida, Masayuki Asai, Kiyoshi |
author_facet | Ikegami, Tsutomu Inatsugi, Toyohiro Kojima, Isao Umemura, Myco Hagiwara, Hiroko Machida, Masayuki Asai, Kiyoshi |
author_sort | Ikegami, Tsutomu |
collection | PubMed |
description | A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data. |
format | Online Article Text |
id | pubmed-4412624 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44126242015-05-12 Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data Ikegami, Tsutomu Inatsugi, Toyohiro Kojima, Isao Umemura, Myco Hagiwara, Hiroko Machida, Masayuki Asai, Kiyoshi PLoS One Research Article A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data. Public Library of Science 2015-04-28 /pmc/articles/PMC4412624/ /pubmed/25919614 http://dx.doi.org/10.1371/journal.pone.0126289 Text en © 2015 Ikegami et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Ikegami, Tsutomu Inatsugi, Toyohiro Kojima, Isao Umemura, Myco Hagiwara, Hiroko Machida, Masayuki Asai, Kiyoshi Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data |
title | Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data |
title_full | Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data |
title_fullStr | Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data |
title_full_unstemmed | Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data |
title_short | Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data |
title_sort | hybrid de novo genome assembly using miseq and solid short read data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412624/ https://www.ncbi.nlm.nih.gov/pubmed/25919614 http://dx.doi.org/10.1371/journal.pone.0126289 |
work_keys_str_mv | AT ikegamitsutomu hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata AT inatsugitoyohiro hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata AT kojimaisao hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata AT umemuramyco hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata AT hagiwarahiroko hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata AT machidamasayuki hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata AT asaikiyoshi hybriddenovogenomeassemblyusingmiseqandsolidshortreaddata |