Cargando…

Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data

Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data an...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Binsheng, Zhu, Rongrong, Yang, Huandong, Lu, Qingqing, Wang, Weiwei, Song, Lei, Sun, Xue, Zhang, Guandong, Li, Shijun, Yang, Jialiang, Tian, Geng, Bing, Pingping, Lang, Jidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409520/
https://www.ncbi.nlm.nih.gov/pubmed/32850708
http://dx.doi.org/10.3389/fbioe.2020.00817
_version_ 1783568079033204736
author He, Binsheng
Zhu, Rongrong
Yang, Huandong
Lu, Qingqing
Wang, Weiwei
Song, Lei
Sun, Xue
Zhang, Guandong
Li, Shijun
Yang, Jialiang
Tian, Geng
Bing, Pingping
Lang, Jidong
author_facet He, Binsheng
Zhu, Rongrong
Yang, Huandong
Lu, Qingqing
Wang, Weiwei
Song, Lei
Sun, Xue
Zhang, Guandong
Li, Shijun
Yang, Jialiang
Tian, Geng
Bing, Pingping
Lang, Jidong
author_sort He, Binsheng
collection PubMed
description Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate.
format Online
Article
Text
id pubmed-7409520
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-74095202020-08-25 Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data He, Binsheng Zhu, Rongrong Yang, Huandong Lu, Qingqing Wang, Weiwei Song, Lei Sun, Xue Zhang, Guandong Li, Shijun Yang, Jialiang Tian, Geng Bing, Pingping Lang, Jidong Front Bioeng Biotechnol Bioengineering and Biotechnology Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate. Frontiers Media S.A. 2020-07-30 /pmc/articles/PMC7409520/ /pubmed/32850708 http://dx.doi.org/10.3389/fbioe.2020.00817 Text en Copyright © 2020 He, Zhu, Yang, Lu, Wang, Song, Sun, Zhang, Li, Yang, Tian, Bing and Lang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
He, Binsheng
Zhu, Rongrong
Yang, Huandong
Lu, Qingqing
Wang, Weiwei
Song, Lei
Sun, Xue
Zhang, Guandong
Li, Shijun
Yang, Jialiang
Tian, Geng
Bing, Pingping
Lang, Jidong
Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
title Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
title_full Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
title_fullStr Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
title_full_unstemmed Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
title_short Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
title_sort assessing the impact of data preprocessing on analyzing next generation sequencing data
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409520/
https://www.ncbi.nlm.nih.gov/pubmed/32850708
http://dx.doi.org/10.3389/fbioe.2020.00817
work_keys_str_mv AT hebinsheng assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT zhurongrong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT yanghuandong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT luqingqing assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT wangweiwei assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT songlei assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT sunxue assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT zhangguandong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT lishijun assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT yangjialiang assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT tiangeng assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT bingpingping assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata
AT langjidong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata