Cargando…

Linked read technology for assembling large complex and polyploid genomes

BACKGROUND: Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential...

Descripción completa

Detalles Bibliográficos
Autores principales: Ott, Alina, Schnable, James C., Yeh, Cheng-Ting, Wu, Linjiang, Liu, Chao, Hu, Heng-Cheng, Dalgard, Clifton L., Sarkar, Soumik, Schnable, Patrick S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122573/
https://www.ncbi.nlm.nih.gov/pubmed/30180802
http://dx.doi.org/10.1186/s12864-018-5040-z
_version_ 1783352682428235776
author Ott, Alina
Schnable, James C.
Yeh, Cheng-Ting
Wu, Linjiang
Liu, Chao
Hu, Heng-Cheng
Dalgard, Clifton L.
Sarkar, Soumik
Schnable, Patrick S.
author_facet Ott, Alina
Schnable, James C.
Yeh, Cheng-Ting
Wu, Linjiang
Liu, Chao
Hu, Heng-Cheng
Dalgard, Clifton L.
Sarkar, Soumik
Schnable, Patrick S.
author_sort Ott, Alina
collection PubMed
description BACKGROUND: Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly. RESULTS: Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with > 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb. CONCLUSIONS: Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5040-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6122573
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61225732018-09-05 Linked read technology for assembling large complex and polyploid genomes Ott, Alina Schnable, James C. Yeh, Cheng-Ting Wu, Linjiang Liu, Chao Hu, Heng-Cheng Dalgard, Clifton L. Sarkar, Soumik Schnable, Patrick S. BMC Genomics Research Article BACKGROUND: Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly. RESULTS: Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with > 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb. CONCLUSIONS: Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5040-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-09-04 /pmc/articles/PMC6122573/ /pubmed/30180802 http://dx.doi.org/10.1186/s12864-018-5040-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ott, Alina
Schnable, James C.
Yeh, Cheng-Ting
Wu, Linjiang
Liu, Chao
Hu, Heng-Cheng
Dalgard, Clifton L.
Sarkar, Soumik
Schnable, Patrick S.
Linked read technology for assembling large complex and polyploid genomes
title Linked read technology for assembling large complex and polyploid genomes
title_full Linked read technology for assembling large complex and polyploid genomes
title_fullStr Linked read technology for assembling large complex and polyploid genomes
title_full_unstemmed Linked read technology for assembling large complex and polyploid genomes
title_short Linked read technology for assembling large complex and polyploid genomes
title_sort linked read technology for assembling large complex and polyploid genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6122573/
https://www.ncbi.nlm.nih.gov/pubmed/30180802
http://dx.doi.org/10.1186/s12864-018-5040-z
work_keys_str_mv AT ottalina linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT schnablejamesc linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT yehchengting linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT wulinjiang linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT liuchao linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT huhengcheng linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT dalgardcliftonl linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT sarkarsoumik linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes
AT schnablepatricks linkedreadtechnologyforassemblinglargecomplexandpolyploidgenomes