Cargando…

A reference human genome dataset of the BGISEQ-500 sequencer

Background: BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Findings: Here, we present the first human whole-genome sequencing dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Jie, Liang, Xinming, Xuan, Yuankai, Geng, Chunyu, Li, Yuxiang, Lu, Haorong, Qu, Shoufang, Mei, Xianglin, Chen, Hongbo, Yu, Ting, Sun, Nan, Rao, Junhua, Wang, Jiahao, Zhang, Wenwei, Chen, Ying, Liao, Sha, Jiang, Hui, Liu, Xin, Yang, Zhaopeng, Mu, Feng, Gao, Shangxian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467036/
https://www.ncbi.nlm.nih.gov/pubmed/28379488
http://dx.doi.org/10.1093/gigascience/gix024
_version_ 1783243200695107584
author Huang, Jie
Liang, Xinming
Xuan, Yuankai
Geng, Chunyu
Li, Yuxiang
Lu, Haorong
Qu, Shoufang
Mei, Xianglin
Chen, Hongbo
Yu, Ting
Sun, Nan
Rao, Junhua
Wang, Jiahao
Zhang, Wenwei
Chen, Ying
Liao, Sha
Jiang, Hui
Liu, Xin
Yang, Zhaopeng
Mu, Feng
Gao, Shangxian
author_facet Huang, Jie
Liang, Xinming
Xuan, Yuankai
Geng, Chunyu
Li, Yuxiang
Lu, Haorong
Qu, Shoufang
Mei, Xianglin
Chen, Hongbo
Yu, Ting
Sun, Nan
Rao, Junhua
Wang, Jiahao
Zhang, Wenwei
Chen, Ying
Liao, Sha
Jiang, Hui
Liu, Xin
Yang, Zhaopeng
Mu, Feng
Gao, Shangxian
author_sort Huang, Jie
collection PubMed
description Background: BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Findings: Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. Conclusions: We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.
format Online
Article
Text
id pubmed-5467036
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54670362017-06-19 A reference human genome dataset of the BGISEQ-500 sequencer Huang, Jie Liang, Xinming Xuan, Yuankai Geng, Chunyu Li, Yuxiang Lu, Haorong Qu, Shoufang Mei, Xianglin Chen, Hongbo Yu, Ting Sun, Nan Rao, Junhua Wang, Jiahao Zhang, Wenwei Chen, Ying Liao, Sha Jiang, Hui Liu, Xin Yang, Zhaopeng Mu, Feng Gao, Shangxian Gigascience Data Note Background: BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Findings: Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. Conclusions: We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. Oxford University Press 2017-04-01 /pmc/articles/PMC5467036/ /pubmed/28379488 http://dx.doi.org/10.1093/gigascience/gix024 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Huang, Jie
Liang, Xinming
Xuan, Yuankai
Geng, Chunyu
Li, Yuxiang
Lu, Haorong
Qu, Shoufang
Mei, Xianglin
Chen, Hongbo
Yu, Ting
Sun, Nan
Rao, Junhua
Wang, Jiahao
Zhang, Wenwei
Chen, Ying
Liao, Sha
Jiang, Hui
Liu, Xin
Yang, Zhaopeng
Mu, Feng
Gao, Shangxian
A reference human genome dataset of the BGISEQ-500 sequencer
title A reference human genome dataset of the BGISEQ-500 sequencer
title_full A reference human genome dataset of the BGISEQ-500 sequencer
title_fullStr A reference human genome dataset of the BGISEQ-500 sequencer
title_full_unstemmed A reference human genome dataset of the BGISEQ-500 sequencer
title_short A reference human genome dataset of the BGISEQ-500 sequencer
title_sort reference human genome dataset of the bgiseq-500 sequencer
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467036/
https://www.ncbi.nlm.nih.gov/pubmed/28379488
http://dx.doi.org/10.1093/gigascience/gix024
work_keys_str_mv AT huangjie areferencehumangenomedatasetofthebgiseq500sequencer
AT liangxinming areferencehumangenomedatasetofthebgiseq500sequencer
AT xuanyuankai areferencehumangenomedatasetofthebgiseq500sequencer
AT gengchunyu areferencehumangenomedatasetofthebgiseq500sequencer
AT liyuxiang areferencehumangenomedatasetofthebgiseq500sequencer
AT luhaorong areferencehumangenomedatasetofthebgiseq500sequencer
AT qushoufang areferencehumangenomedatasetofthebgiseq500sequencer
AT meixianglin areferencehumangenomedatasetofthebgiseq500sequencer
AT chenhongbo areferencehumangenomedatasetofthebgiseq500sequencer
AT yuting areferencehumangenomedatasetofthebgiseq500sequencer
AT sunnan areferencehumangenomedatasetofthebgiseq500sequencer
AT raojunhua areferencehumangenomedatasetofthebgiseq500sequencer
AT wangjiahao areferencehumangenomedatasetofthebgiseq500sequencer
AT zhangwenwei areferencehumangenomedatasetofthebgiseq500sequencer
AT chenying areferencehumangenomedatasetofthebgiseq500sequencer
AT liaosha areferencehumangenomedatasetofthebgiseq500sequencer
AT jianghui areferencehumangenomedatasetofthebgiseq500sequencer
AT liuxin areferencehumangenomedatasetofthebgiseq500sequencer
AT yangzhaopeng areferencehumangenomedatasetofthebgiseq500sequencer
AT mufeng areferencehumangenomedatasetofthebgiseq500sequencer
AT gaoshangxian areferencehumangenomedatasetofthebgiseq500sequencer
AT huangjie referencehumangenomedatasetofthebgiseq500sequencer
AT liangxinming referencehumangenomedatasetofthebgiseq500sequencer
AT xuanyuankai referencehumangenomedatasetofthebgiseq500sequencer
AT gengchunyu referencehumangenomedatasetofthebgiseq500sequencer
AT liyuxiang referencehumangenomedatasetofthebgiseq500sequencer
AT luhaorong referencehumangenomedatasetofthebgiseq500sequencer
AT qushoufang referencehumangenomedatasetofthebgiseq500sequencer
AT meixianglin referencehumangenomedatasetofthebgiseq500sequencer
AT chenhongbo referencehumangenomedatasetofthebgiseq500sequencer
AT yuting referencehumangenomedatasetofthebgiseq500sequencer
AT sunnan referencehumangenomedatasetofthebgiseq500sequencer
AT raojunhua referencehumangenomedatasetofthebgiseq500sequencer
AT wangjiahao referencehumangenomedatasetofthebgiseq500sequencer
AT zhangwenwei referencehumangenomedatasetofthebgiseq500sequencer
AT chenying referencehumangenomedatasetofthebgiseq500sequencer
AT liaosha referencehumangenomedatasetofthebgiseq500sequencer
AT jianghui referencehumangenomedatasetofthebgiseq500sequencer
AT liuxin referencehumangenomedatasetofthebgiseq500sequencer
AT yangzhaopeng referencehumangenomedatasetofthebgiseq500sequencer
AT mufeng referencehumangenomedatasetofthebgiseq500sequencer
AT gaoshangxian referencehumangenomedatasetofthebgiseq500sequencer