Cargando…
SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
BACKGROUND: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essen...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857306/ https://www.ncbi.nlm.nih.gov/pubmed/31807141 http://dx.doi.org/10.1186/s13040-019-0209-9 |
_version_ | 1783470742514434048 |
---|---|
author | Zhou, Yanqiu Liu, Chen Zhou, Rongfang Lu, Anzhi Huang, Biao Liu, Liling Chen, Ling Luo, Bei Huang, Jin Tian, Zhijian |
author_facet | Zhou, Yanqiu Liu, Chen Zhou, Rongfang Lu, Anzhi Huang, Biao Liu, Liling Chen, Ling Luo, Bei Huang, Jin Tian, Zhijian |
author_sort | Zhou, Yanqiu |
collection | PubMed |
description | BACKGROUND: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500. RESULTS: A total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R(2) of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R(2) of 0.96. These results indicate that our simulation model was reliable and effective. CONCLUSIONS: Data sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website http://seqBEACON.genomics.cn:443/home.html. |
format | Online Article Text |
id | pubmed-6857306 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68573062019-12-05 SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 Zhou, Yanqiu Liu, Chen Zhou, Rongfang Lu, Anzhi Huang, Biao Liu, Liling Chen, Ling Luo, Bei Huang, Jin Tian, Zhijian BioData Min Research BACKGROUND: The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500. RESULTS: A total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R(2) of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R(2) of 0.96. These results indicate that our simulation model was reliable and effective. CONCLUSIONS: Data sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website http://seqBEACON.genomics.cn:443/home.html. BioMed Central 2019-11-15 /pmc/articles/PMC6857306/ /pubmed/31807141 http://dx.doi.org/10.1186/s13040-019-0209-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zhou, Yanqiu Liu, Chen Zhou, Rongfang Lu, Anzhi Huang, Biao Liu, Liling Chen, Ling Luo, Bei Huang, Jin Tian, Zhijian SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 |
title | SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 |
title_full | SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 |
title_fullStr | SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 |
title_full_unstemmed | SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 |
title_short | SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 |
title_sort | seqdata-beacon: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in bgiseq-500 |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857306/ https://www.ncbi.nlm.nih.gov/pubmed/31807141 http://dx.doi.org/10.1186/s13040-019-0209-9 |
work_keys_str_mv | AT zhouyanqiu seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT liuchen seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT zhourongfang seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT luanzhi seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT huangbiao seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT liuliling seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT chenling seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT luobei seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT huangjin seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 AT tianzhijian seqdatabeaconacomprehensivedatabaseofsequencingperformanceandstatisticaltoolsforperformanceevaluationandyieldsimulationinbgiseq500 |