Cargando…

Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome

Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual le...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Runsheng, Hsieh, Chia-Ling, Young, Amanda, Zhang, Zhihong, Ren, Xiaoliang, Zhao, Zhongying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4650653/
https://www.ncbi.nlm.nih.gov/pubmed/26039588
http://dx.doi.org/10.1038/srep10814
_version_ 1782401529962037248
author Li, Runsheng
Hsieh, Chia-Ling
Young, Amanda
Zhang, Zhihong
Ren, Xiaoliang
Zhao, Zhongying
author_facet Li, Runsheng
Hsieh, Chia-Ling
Young, Amanda
Zhang, Zhihong
Ren, Xiaoliang
Zhao, Zhongying
author_sort Li, Runsheng
collection PubMed
description Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads.
format Online
Article
Text
id pubmed-4650653
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46506532015-11-24 Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome Li, Runsheng Hsieh, Chia-Ling Young, Amanda Zhang, Zhihong Ren, Xiaoliang Zhao, Zhongying Sci Rep Article Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. Nature Publishing Group 2015-06-03 /pmc/articles/PMC4650653/ /pubmed/26039588 http://dx.doi.org/10.1038/srep10814 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Li, Runsheng
Hsieh, Chia-Ling
Young, Amanda
Zhang, Zhihong
Ren, Xiaoliang
Zhao, Zhongying
Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome
title Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome
title_full Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome
title_fullStr Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome
title_full_unstemmed Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome
title_short Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome
title_sort illumina synthetic long read sequencing allows recovery of missing sequences even in the “finished” c. elegans genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4650653/
https://www.ncbi.nlm.nih.gov/pubmed/26039588
http://dx.doi.org/10.1038/srep10814
work_keys_str_mv AT lirunsheng illuminasyntheticlongreadsequencingallowsrecoveryofmissingsequenceseveninthefinishedcelegansgenome
AT hsiehchialing illuminasyntheticlongreadsequencingallowsrecoveryofmissingsequenceseveninthefinishedcelegansgenome
AT youngamanda illuminasyntheticlongreadsequencingallowsrecoveryofmissingsequenceseveninthefinishedcelegansgenome
AT zhangzhihong illuminasyntheticlongreadsequencingallowsrecoveryofmissingsequenceseveninthefinishedcelegansgenome
AT renxiaoliang illuminasyntheticlongreadsequencingallowsrecoveryofmissingsequenceseveninthefinishedcelegansgenome
AT zhaozhongying illuminasyntheticlongreadsequencingallowsrecoveryofmissingsequenceseveninthefinishedcelegansgenome