Cargando…

Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics

BACKGROUND: Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yuanhang, Bhagwate, Aditya, Winham, Stacey J., Stephens, Melissa T., Harker, Brent W., McDonough, Samantha J., Stallings-Mann, Melody L., Heinzen, Ethan P., Vierkant, Robert A., Hoskin, Tanya L., Frost, Marlene H., Carter, Jodi M., Pfrender, Michael E., Littlepage, Laurie, Radisky, Derek C., Cunningham, Julie M., Degnim, Amy C., Wang, Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9479231/
https://www.ncbi.nlm.nih.gov/pubmed/36114500
http://dx.doi.org/10.1186/s12920-022-01355-0
_version_ 1784790741907668992
author Liu, Yuanhang
Bhagwate, Aditya
Winham, Stacey J.
Stephens, Melissa T.
Harker, Brent W.
McDonough, Samantha J.
Stallings-Mann, Melody L.
Heinzen, Ethan P.
Vierkant, Robert A.
Hoskin, Tanya L.
Frost, Marlene H.
Carter, Jodi M.
Pfrender, Michael E.
Littlepage, Laurie
Radisky, Derek C.
Cunningham, Julie M.
Degnim, Amy C.
Wang, Chen
author_facet Liu, Yuanhang
Bhagwate, Aditya
Winham, Stacey J.
Stephens, Melissa T.
Harker, Brent W.
McDonough, Samantha J.
Stallings-Mann, Melody L.
Heinzen, Ethan P.
Vierkant, Robert A.
Hoskin, Tanya L.
Frost, Marlene H.
Carter, Jodi M.
Pfrender, Michael E.
Littlepage, Laurie
Radisky, Derek C.
Cunningham, Julie M.
Degnim, Amy C.
Wang, Chen
author_sort Liu, Yuanhang
collection PubMed
description BACKGROUND: Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that determine which FFPE samples have the potential for successful RNA extraction, library preparation, and generation of usable RNAseq data. METHODS: We optimized library preparation protocols designed for use with FFPE samples using seven FFPE and Fresh Frozen replicate pairs, and tested optimized protocols using a study set of 130 FFPE biopsies from women with benign breast disease. Metrics from RNA extraction and preparation procedures were collected and compared with bioinformatics sequencing summary statistics. Finally, a decision tree model was built to learn the relationship between pre-sequencing lab metrics and qc pass/fail status as determined by bioinformatics metrics. RESULTS: Samples that failed bioinformatics qc tended to have low median sample-wise correlation within the cohort (Spearman correlation < 0.75), low number of reads mapped to gene regions (< 25 million), or low number of detectable genes (11,400 # of detected genes with TPM > 4). The median RNA concentration and pre-capture library Qubit values for qc failed samples were 18.9 ng/ul and 2.08 ng/ul respectively, which were significantly lower than those of qc pass samples (40.8 ng/ul and 5.82 ng/ul). We built a decision tree model based on input RNA concentration, input library qubit values, and achieved an F score of 0.848 in predicting QC status (pass/fail) of FFPE samples. CONCLUSIONS: We provide a bioinformatics quality control recommendation for FFPE samples from breast tissue by evaluating bioinformatic and sample metrics. Our results suggest a minimum concentration of 25 ng/ul FFPE-extracted RNA for library preparation and 1.7 ng/ul pre-capture library output to achieve adequate RNA-seq data for downstream bioinformatics analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-022-01355-0.
format Online
Article
Text
id pubmed-9479231
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94792312022-09-17 Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics Liu, Yuanhang Bhagwate, Aditya Winham, Stacey J. Stephens, Melissa T. Harker, Brent W. McDonough, Samantha J. Stallings-Mann, Melody L. Heinzen, Ethan P. Vierkant, Robert A. Hoskin, Tanya L. Frost, Marlene H. Carter, Jodi M. Pfrender, Michael E. Littlepage, Laurie Radisky, Derek C. Cunningham, Julie M. Degnim, Amy C. Wang, Chen BMC Med Genomics Research BACKGROUND: Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that determine which FFPE samples have the potential for successful RNA extraction, library preparation, and generation of usable RNAseq data. METHODS: We optimized library preparation protocols designed for use with FFPE samples using seven FFPE and Fresh Frozen replicate pairs, and tested optimized protocols using a study set of 130 FFPE biopsies from women with benign breast disease. Metrics from RNA extraction and preparation procedures were collected and compared with bioinformatics sequencing summary statistics. Finally, a decision tree model was built to learn the relationship between pre-sequencing lab metrics and qc pass/fail status as determined by bioinformatics metrics. RESULTS: Samples that failed bioinformatics qc tended to have low median sample-wise correlation within the cohort (Spearman correlation < 0.75), low number of reads mapped to gene regions (< 25 million), or low number of detectable genes (11,400 # of detected genes with TPM > 4). The median RNA concentration and pre-capture library Qubit values for qc failed samples were 18.9 ng/ul and 2.08 ng/ul respectively, which were significantly lower than those of qc pass samples (40.8 ng/ul and 5.82 ng/ul). We built a decision tree model based on input RNA concentration, input library qubit values, and achieved an F score of 0.848 in predicting QC status (pass/fail) of FFPE samples. CONCLUSIONS: We provide a bioinformatics quality control recommendation for FFPE samples from breast tissue by evaluating bioinformatic and sample metrics. Our results suggest a minimum concentration of 25 ng/ul FFPE-extracted RNA for library preparation and 1.7 ng/ul pre-capture library output to achieve adequate RNA-seq data for downstream bioinformatics analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-022-01355-0. BioMed Central 2022-09-16 /pmc/articles/PMC9479231/ /pubmed/36114500 http://dx.doi.org/10.1186/s12920-022-01355-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Yuanhang
Bhagwate, Aditya
Winham, Stacey J.
Stephens, Melissa T.
Harker, Brent W.
McDonough, Samantha J.
Stallings-Mann, Melody L.
Heinzen, Ethan P.
Vierkant, Robert A.
Hoskin, Tanya L.
Frost, Marlene H.
Carter, Jodi M.
Pfrender, Michael E.
Littlepage, Laurie
Radisky, Derek C.
Cunningham, Julie M.
Degnim, Amy C.
Wang, Chen
Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
title Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
title_full Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
title_fullStr Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
title_full_unstemmed Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
title_short Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
title_sort quality control recommendations for rnaseq using ffpe samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9479231/
https://www.ncbi.nlm.nih.gov/pubmed/36114500
http://dx.doi.org/10.1186/s12920-022-01355-0
work_keys_str_mv AT liuyuanhang qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT bhagwateaditya qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT winhamstaceyj qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT stephensmelissat qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT harkerbrentw qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT mcdonoughsamanthaj qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT stallingsmannmelodyl qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT heinzenethanp qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT vierkantroberta qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT hoskintanyal qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT frostmarleneh qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT carterjodim qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT pfrendermichaele qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT littlepagelaurie qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT radiskyderekc qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT cunninghamjuliem qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT degnimamyc qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics
AT wangchen qualitycontrolrecommendationsforrnasequsingffpesamplesbasedonpresequencinglabmetricsandpostsequencingbioinformaticsmetrics