Cargando…
Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
BACKGROUND: Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that...
Autores principales: | , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9479231/ https://www.ncbi.nlm.nih.gov/pubmed/36114500 http://dx.doi.org/10.1186/s12920-022-01355-0 |
_version_ | 1784790741907668992 |
---|---|
author | Liu, Yuanhang Bhagwate, Aditya Winham, Stacey J. Stephens, Melissa T. Harker, Brent W. McDonough, Samantha J. Stallings-Mann, Melody L. Heinzen, Ethan P. Vierkant, Robert A. Hoskin, Tanya L. Frost, Marlene H. Carter, Jodi M. Pfrender, Michael E. Littlepage, Laurie Radisky, Derek C. Cunningham, Julie M. Degnim, Amy C. Wang, Chen |
author_facet | Liu, Yuanhang Bhagwate, Aditya Winham, Stacey J. Stephens, Melissa T. Harker, Brent W. McDonough, Samantha J. Stallings-Mann, Melody L. Heinzen, Ethan P. Vierkant, Robert A. Hoskin, Tanya L. Frost, Marlene H. Carter, Jodi M. Pfrender, Michael E. Littlepage, Laurie Radisky, Derek C. Cunningham, Julie M. Degnim, Amy C. Wang, Chen |
author_sort | Liu, Yuanhang |
collection | PubMed |
description | BACKGROUND: Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that determine which FFPE samples have the potential for successful RNA extraction, library preparation, and generation of usable RNAseq data. METHODS: We optimized library preparation protocols designed for use with FFPE samples using seven FFPE and Fresh Frozen replicate pairs, and tested optimized protocols using a study set of 130 FFPE biopsies from women with benign breast disease. Metrics from RNA extraction and preparation procedures were collected and compared with bioinformatics sequencing summary statistics. Finally, a decision tree model was built to learn the relationship between pre-sequencing lab metrics and qc pass/fail status as determined by bioinformatics metrics. RESULTS: Samples that failed bioinformatics qc tended to have low median sample-wise correlation within the cohort (Spearman correlation < 0.75), low number of reads mapped to gene regions (< 25 million), or low number of detectable genes (11,400 # of detected genes with TPM > 4). The median RNA concentration and pre-capture library Qubit values for qc failed samples were 18.9 ng/ul and 2.08 ng/ul respectively, which were significantly lower than those of qc pass samples (40.8 ng/ul and 5.82 ng/ul). We built a decision tree model based on input RNA concentration, input library qubit values, and achieved an F score of 0.848 in predicting QC status (pass/fail) of FFPE samples. CONCLUSIONS: We provide a bioinformatics quality control recommendation for FFPE samples from breast tissue by evaluating bioinformatic and sample metrics. Our results suggest a minimum concentration of 25 ng/ul FFPE-extracted RNA for library preparation and 1.7 ng/ul pre-capture library output to achieve adequate RNA-seq data for downstream bioinformatics analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-022-01355-0. |
format | Online Article Text |
id | pubmed-9479231 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-94792312022-09-17 Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics Liu, Yuanhang Bhagwate, Aditya Winham, Stacey J. Stephens, Melissa T. Harker, Brent W. McDonough, Samantha J. Stallings-Mann, Melody L. Heinzen, Ethan P. Vierkant, Robert A. Hoskin, Tanya L. Frost, Marlene H. Carter, Jodi M. Pfrender, Michael E. Littlepage, Laurie Radisky, Derek C. Cunningham, Julie M. Degnim, Amy C. Wang, Chen BMC Med Genomics Research BACKGROUND: Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that determine which FFPE samples have the potential for successful RNA extraction, library preparation, and generation of usable RNAseq data. METHODS: We optimized library preparation protocols designed for use with FFPE samples using seven FFPE and Fresh Frozen replicate pairs, and tested optimized protocols using a study set of 130 FFPE biopsies from women with benign breast disease. Metrics from RNA extraction and preparation procedures were collected and compared with bioinformatics sequencing summary statistics. Finally, a decision tree model was built to learn the relationship between pre-sequencing lab metrics and qc pass/fail status as determined by bioinformatics metrics. RESULTS: Samples that failed bioinformatics qc tended to have low median sample-wise correlation within the cohort (Spearman correlation < 0.75), low number of reads mapped to gene regions (< 25 million), or low number of detectable genes (11,400 # of detected genes with TPM > 4). The median RNA concentration and pre-capture library Qubit values for qc failed samples were 18.9 ng/ul and 2.08 ng/ul respectively, which were significantly lower than those of qc pass samples (40.8 ng/ul and 5.82 ng/ul). We built a decision tree model based on input RNA concentration, input library qubit values, and achieved an F score of 0.848 in predicting QC status (pass/fail) of FFPE samples. CONCLUSIONS: We provide a bioinformatics quality control recommendation for FFPE samples from breast tissue by evaluating bioinformatic and sample metrics. Our results suggest a minimum concentration of 25 ng/ul FFPE-extracted RNA for library preparation and 1.7 ng/ul pre-capture library output to achieve adequate RNA-seq data for downstream bioinformatics analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-022-01355-0. BioMed Central 2022-09-16 /pmc/articles/PMC9479231/ /pubmed/36114500 http://dx.doi.org/10.1186/s12920-022-01355-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liu, Yuanhang Bhagwate, Aditya Winham, Stacey J. Stephens, Melissa T. Harker, Brent W. McDonough, Samantha J. Stallings-Mann, Melody L. Heinzen, Ethan P. Vierkant, Robert A. Hoskin, Tanya L. Frost, Marlene H. Carter, Jodi M. Pfrender, Michael E. Littlepage, Laurie Radisky, Derek C. Cunningham, Julie M. Degnim, Amy C. Wang, Chen Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
title | Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
title_full | Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
title_fullStr | Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
title_full_unstemmed | Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
title_short | Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
title_sort | quality control recommendations for rnaseq using ffpe samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9479231/ https://www.ncbi.nlm.nih.gov/pubmed/36114500 http://dx.doi.org/10.1186/s12920-022-01355-0 |
work_keys_str_mv | AT liuyuanhang qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT bhagwateaditya qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT winhamstaceyj qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT stephensmelissat qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT harkerbrentw qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT mcdonoughsamanthaj qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT stallingsmannmelodyl qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT heinzenethanp qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT vierkantroberta qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT hoskintanyal qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT frostmarleneh qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT carterjodim qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT pfrendermichaele qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT littlepagelaurie qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT radiskyderekc qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT cunninghamjuliem qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT degnimamyc qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics AT wangchen qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics |