High-Throughput Identification of Adapters in Single-Read Sequencing Data
Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first ess...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7356586/ https://www.ncbi.nlm.nih.gov/pubmed/32521604 http://dx.doi.org/10.3390/biom10060878 |
_version_ | 1783558524002893824 |
---|---|
author | Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor |
author_facet | Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor |
author_sort | Mohideen, Asan M.S.H. |
collection | PubMed |
description | Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first essential step. Tools available for the automated detection of adapters in single-read sequencing protocol datasets have certain limitations. To explore these datasets, one needs to retrieve the information on adapter sequences from the methods sections of appropriate research articles. This can be time-consuming in metadata analyses. Moreover, not all research articles provide the information on adapter sequences. We have developed adapt_find, a tool that automates the process of adapter sequences identification in raw single-read sequencing datasets. We have verified adapt_find through testing a number of publicly available datasets. adapt_find secures a robust, reliable and high-throughput process across different sequencing technologies and various adapter designs. It does not need prior knowledge of the adapter sequences. We also produced associated tools: random_mer, for the detection of random N bases either on one or both termini of the reads, and fastqc_parser, for consolidating the results from FASTQC outputs. Together, this is a valuable tool set for metadata analyses on multiple sequencing datasets. |
format | Online Article Text |
id | pubmed-7356586 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-73565862020-07-22 High-Throughput Identification of Adapters in Single-Read Sequencing Data Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor Biomolecules Article Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first essential step. Tools available for the automated detection of adapters in single-read sequencing protocol datasets have certain limitations. To explore these datasets, one needs to retrieve the information on adapter sequences from the methods sections of appropriate research articles. This can be time-consuming in metadata analyses. Moreover, not all research articles provide the information on adapter sequences. We have developed adapt_find, a tool that automates the process of adapter sequences identification in raw single-read sequencing datasets. We have verified adapt_find through testing a number of publicly available datasets. adapt_find secures a robust, reliable and high-throughput process across different sequencing technologies and various adapter designs. It does not need prior knowledge of the adapter sequences. We also produced associated tools: random_mer, for the detection of random N bases either on one or both termini of the reads, and fastqc_parser, for consolidating the results from FASTQC outputs. Together, this is a valuable tool set for metadata analyses on multiple sequencing datasets. MDPI 2020-06-08 /pmc/articles/PMC7356586/ /pubmed/32521604 http://dx.doi.org/10.3390/biom10060878 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor High-Throughput Identification of Adapters in Single-Read Sequencing Data |
title | High-Throughput Identification of Adapters in Single-Read Sequencing Data |
title_full | High-Throughput Identification of Adapters in Single-Read Sequencing Data |
title_fullStr | High-Throughput Identification of Adapters in Single-Read Sequencing Data |
title_full_unstemmed | High-Throughput Identification of Adapters in Single-Read Sequencing Data |
title_short | High-Throughput Identification of Adapters in Single-Read Sequencing Data |
title_sort | high-throughput identification of adapters in single-read sequencing data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7356586/ https://www.ncbi.nlm.nih.gov/pubmed/32521604 http://dx.doi.org/10.3390/biom10060878 |
work_keys_str_mv | AT mohideenasanmsh highthroughputidentificationofadaptersinsinglereadsequencingdata AT johansensteinard highthroughputidentificationofadaptersinsinglereadsequencingdata AT babiakigor highthroughputidentificationofadaptersinsinglereadsequencingdata |