Cargando…
Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data an...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409520/ https://www.ncbi.nlm.nih.gov/pubmed/32850708 http://dx.doi.org/10.3389/fbioe.2020.00817 |
_version_ | 1783568079033204736 |
---|---|
author | He, Binsheng Zhu, Rongrong Yang, Huandong Lu, Qingqing Wang, Weiwei Song, Lei Sun, Xue Zhang, Guandong Li, Shijun Yang, Jialiang Tian, Geng Bing, Pingping Lang, Jidong |
author_facet | He, Binsheng Zhu, Rongrong Yang, Huandong Lu, Qingqing Wang, Weiwei Song, Lei Sun, Xue Zhang, Guandong Li, Shijun Yang, Jialiang Tian, Geng Bing, Pingping Lang, Jidong |
author_sort | He, Binsheng |
collection | PubMed |
description | Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate. |
format | Online Article Text |
id | pubmed-7409520 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-74095202020-08-25 Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data He, Binsheng Zhu, Rongrong Yang, Huandong Lu, Qingqing Wang, Weiwei Song, Lei Sun, Xue Zhang, Guandong Li, Shijun Yang, Jialiang Tian, Geng Bing, Pingping Lang, Jidong Front Bioeng Biotechnol Bioengineering and Biotechnology Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate. Frontiers Media S.A. 2020-07-30 /pmc/articles/PMC7409520/ /pubmed/32850708 http://dx.doi.org/10.3389/fbioe.2020.00817 Text en Copyright © 2020 He, Zhu, Yang, Lu, Wang, Song, Sun, Zhang, Li, Yang, Tian, Bing and Lang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioengineering and Biotechnology He, Binsheng Zhu, Rongrong Yang, Huandong Lu, Qingqing Wang, Weiwei Song, Lei Sun, Xue Zhang, Guandong Li, Shijun Yang, Jialiang Tian, Geng Bing, Pingping Lang, Jidong Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data |
title | Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data |
title_full | Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data |
title_fullStr | Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data |
title_full_unstemmed | Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data |
title_short | Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data |
title_sort | assessing the impact of data preprocessing on analyzing next generation sequencing data |
topic | Bioengineering and Biotechnology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409520/ https://www.ncbi.nlm.nih.gov/pubmed/32850708 http://dx.doi.org/10.3389/fbioe.2020.00817 |
work_keys_str_mv | AT hebinsheng assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT zhurongrong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT yanghuandong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT luqingqing assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT wangweiwei assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT songlei assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT sunxue assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT zhangguandong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT lishijun assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT yangjialiang assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT tiangeng assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT bingpingping assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata AT langjidong assessingtheimpactofdatapreprocessingonanalyzingnextgenerationsequencingdata |