Cargando…

SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500

BACKGROUND: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essen...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Yanqiu, Liu, Chen, Zhou, Rongfang, Lu, Anzhi, Huang, Biao, Liu, Liling, Chen, Ling, Luo, Bei, Huang, Jin, Tian, Zhijian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857306/
https://www.ncbi.nlm.nih.gov/pubmed/31807141
http://dx.doi.org/10.1186/s13040-019-0209-9
_version_ 1783470742514434048
author Zhou, Yanqiu
Liu, Chen
Zhou, Rongfang
Lu, Anzhi
Huang, Biao
Liu, Liling
Chen, Ling
Luo, Bei
Huang, Jin
Tian, Zhijian
author_facet Zhou, Yanqiu
Liu, Chen
Zhou, Rongfang
Lu, Anzhi
Huang, Biao
Liu, Liling
Chen, Ling
Luo, Bei
Huang, Jin
Tian, Zhijian
author_sort Zhou, Yanqiu
collection PubMed
description BACKGROUND: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500. RESULTS: A total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R(2) of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R(2) of 0.96. These results indicate that our simulation model was reliable and effective. CONCLUSIONS: Data sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website http://seqBEACON.genomics.cn:443/home.html.
format Online
Article
Text
id pubmed-6857306
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68573062019-12-05 SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 Zhou, Yanqiu Liu, Chen Zhou, Rongfang Lu, Anzhi Huang, Biao Liu, Liling Chen, Ling Luo, Bei Huang, Jin Tian, Zhijian BioData Min Research BACKGROUND: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500. RESULTS: A total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R(2) of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R(2) of 0.96. These results indicate that our simulation model was reliable and effective. CONCLUSIONS: Data sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website http://seqBEACON.genomics.cn:443/home.html. BioMed Central 2019-11-15 /pmc/articles/PMC6857306/ /pubmed/31807141 http://dx.doi.org/10.1186/s13040-019-0209-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhou, Yanqiu
Liu, Chen
Zhou, Rongfang
Lu, Anzhi
Huang, Biao
Liu, Liling
Chen, Ling
Luo, Bei
Huang, Jin
Tian, Zhijian
SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
title SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
title_full SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
title_fullStr SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
title_full_unstemmed SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
title_short SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
title_sort seqdata-beacon: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in bgiseq-500
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857306/
https://www.ncbi.nlm.nih.gov/pubmed/31807141
http://dx.doi.org/10.1186/s13040-019-0209-9
work_keys_str_mv AT zhouyanqiu seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT liuchen seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT zhourongfang seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT luanzhi seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT huangbiao seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT liuliling seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT chenling seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT luobei seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT huangjin seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500
AT tianzhijian seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500