Cargando…

A pilot study for channel catfish whole genome sequencing and de novo assembly

BACKGROUND: Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yanliang, Lu, Jianguo, Peatman, Eric, Kucuktas, Huseyin, Liu, Shikai, Wang, Shaolin, Sun, Fanyue, Liu, Zhanjiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3266365/
https://www.ncbi.nlm.nih.gov/pubmed/22192763
http://dx.doi.org/10.1186/1471-2164-12-629
_version_ 1782222172669870080
author Jiang, Yanliang
Lu, Jianguo
Peatman, Eric
Kucuktas, Huseyin
Liu, Shikai
Wang, Shaolin
Sun, Fanyue
Liu, Zhanjiang
author_facet Jiang, Yanliang
Lu, Jianguo
Peatman, Eric
Kucuktas, Huseyin
Liu, Shikai
Wang, Shaolin
Sun, Fanyue
Liu, Zhanjiang
author_sort Jiang, Yanliang
collection PubMed
description BACKGROUND: Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for de novo sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs. RESULTS: A combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp. CONCLUSIONS: A hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly.
format Online
Article
Text
id pubmed-3266365
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32663652012-01-26 A pilot study for channel catfish whole genome sequencing and de novo assembly Jiang, Yanliang Lu, Jianguo Peatman, Eric Kucuktas, Huseyin Liu, Shikai Wang, Shaolin Sun, Fanyue Liu, Zhanjiang BMC Genomics Research Article BACKGROUND: Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for de novo sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs. RESULTS: A combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp. CONCLUSIONS: A hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly. BioMed Central 2011-12-22 /pmc/articles/PMC3266365/ /pubmed/22192763 http://dx.doi.org/10.1186/1471-2164-12-629 Text en Copyright ©2011 Jiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jiang, Yanliang
Lu, Jianguo
Peatman, Eric
Kucuktas, Huseyin
Liu, Shikai
Wang, Shaolin
Sun, Fanyue
Liu, Zhanjiang
A pilot study for channel catfish whole genome sequencing and de novo assembly
title A pilot study for channel catfish whole genome sequencing and de novo assembly
title_full A pilot study for channel catfish whole genome sequencing and de novo assembly
title_fullStr A pilot study for channel catfish whole genome sequencing and de novo assembly
title_full_unstemmed A pilot study for channel catfish whole genome sequencing and de novo assembly
title_short A pilot study for channel catfish whole genome sequencing and de novo assembly
title_sort pilot study for channel catfish whole genome sequencing and de novo assembly
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3266365/
https://www.ncbi.nlm.nih.gov/pubmed/22192763
http://dx.doi.org/10.1186/1471-2164-12-629
work_keys_str_mv AT jiangyanliang apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT lujianguo apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT peatmaneric apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT kucuktashuseyin apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT liushikai apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT wangshaolin apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT sunfanyue apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT liuzhanjiang apilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT jiangyanliang pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT lujianguo pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT peatmaneric pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT kucuktashuseyin pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT liushikai pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT wangshaolin pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT sunfanyue pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly
AT liuzhanjiang pilotstudyforchannelcatfishwholegenomesequencinganddenovoassembly