Cargando…

Assessment of human diploid genome assembly with 10x Linked-Reads data

BACKGROUND: Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lu, Zhou, Xin, Weng, Ziming, Sidow, Arend
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879002/
https://www.ncbi.nlm.nih.gov/pubmed/31769805
http://dx.doi.org/10.1093/gigascience/giz141
_version_ 1783473545108520960
author Zhang, Lu
Zhou, Xin
Weng, Ziming
Sidow, Arend
author_facet Zhang, Lu
Zhou, Xin
Weng, Ziming
Sidow, Arend
author_sort Zhang, Lu
collection PubMed
description BACKGROUND: Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries. RESULTS: We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole-genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (C(F)) or read coverage per fragment (C(R)) within broad ranges. The optimal physical coverage was between 332× and 823× and assembly quality worsened if it increased to >1,000× for a given C. Long DNA fragments could significantly extend phase blocks but decreased contig contiguity. The optimal length-weighted fragment length (W [Formula: see text]) was ∼50–150 kb. When broadly optimal parameters were used for library preparation and sequencing, ∼80% of the genome was assembled in a diploid state. CONCLUSIONS: The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing.
format Online
Article
Text
id pubmed-6879002
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68790022019-12-03 Assessment of human diploid genome assembly with 10x Linked-Reads data Zhang, Lu Zhou, Xin Weng, Ziming Sidow, Arend Gigascience Data Note BACKGROUND: Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries. RESULTS: We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole-genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (C(F)) or read coverage per fragment (C(R)) within broad ranges. The optimal physical coverage was between 332× and 823× and assembly quality worsened if it increased to >1,000× for a given C. Long DNA fragments could significantly extend phase blocks but decreased contig contiguity. The optimal length-weighted fragment length (W [Formula: see text]) was ∼50–150 kb. When broadly optimal parameters were used for library preparation and sequencing, ∼80% of the genome was assembled in a diploid state. CONCLUSIONS: The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing. Oxford University Press 2019-11-26 /pmc/articles/PMC6879002/ /pubmed/31769805 http://dx.doi.org/10.1093/gigascience/giz141 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Zhang, Lu
Zhou, Xin
Weng, Ziming
Sidow, Arend
Assessment of human diploid genome assembly with 10x Linked-Reads data
title Assessment of human diploid genome assembly with 10x Linked-Reads data
title_full Assessment of human diploid genome assembly with 10x Linked-Reads data
title_fullStr Assessment of human diploid genome assembly with 10x Linked-Reads data
title_full_unstemmed Assessment of human diploid genome assembly with 10x Linked-Reads data
title_short Assessment of human diploid genome assembly with 10x Linked-Reads data
title_sort assessment of human diploid genome assembly with 10x linked-reads data
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879002/
https://www.ncbi.nlm.nih.gov/pubmed/31769805
http://dx.doi.org/10.1093/gigascience/giz141
work_keys_str_mv AT zhanglu assessmentofhumandiploidgenomeassemblywith10xlinkedreadsdata
AT zhouxin assessmentofhumandiploidgenomeassemblywith10xlinkedreadsdata
AT wengziming assessmentofhumandiploidgenomeassemblywith10xlinkedreadsdata
AT sidowarend assessmentofhumandiploidgenomeassemblywith10xlinkedreadsdata