High-Throughput Identification of Adapters in Single-Read Sequencing Data

Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first ess...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mohideen, Asan M.S.H., Johansen, Steinar D., Babiak, Igor
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7356586/ https://www.ncbi.nlm.nih.gov/pubmed/32521604 http://dx.doi.org/10.3390/biom10060878

_version_	1783558524002893824
author	Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor
author_facet	Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor
author_sort	Mohideen, Asan M.S.H.
collection	PubMed
description	Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first essential step. Tools available for the automated detection of adapters in single-read sequencing protocol datasets have certain limitations. To explore these datasets, one needs to retrieve the information on adapter sequences from the methods sections of appropriate research articles. This can be time-consuming in metadata analyses. Moreover, not all research articles provide the information on adapter sequences. We have developed adapt_find, a tool that automates the process of adapter sequences identification in raw single-read sequencing datasets. We have verified adapt_find through testing a number of publicly available datasets. adapt_find secures a robust, reliable and high-throughput process across different sequencing technologies and various adapter designs. It does not need prior knowledge of the adapter sequences. We also produced associated tools: random_mer, for the detection of random N bases either on one or both termini of the reads, and fastqc_parser, for consolidating the results from FASTQC outputs. Together, this is a valuable tool set for metadata analyses on multiple sequencing datasets.
format	Online Article Text
id	pubmed-7356586
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-73565862020-07-22 High-Throughput Identification of Adapters in Single-Read Sequencing Data Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor Biomolecules Article Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first essential step. Tools available for the automated detection of adapters in single-read sequencing protocol datasets have certain limitations. To explore these datasets, one needs to retrieve the information on adapter sequences from the methods sections of appropriate research articles. This can be time-consuming in metadata analyses. Moreover, not all research articles provide the information on adapter sequences. We have developed adapt_find, a tool that automates the process of adapter sequences identification in raw single-read sequencing datasets. We have verified adapt_find through testing a number of publicly available datasets. adapt_find secures a robust, reliable and high-throughput process across different sequencing technologies and various adapter designs. It does not need prior knowledge of the adapter sequences. We also produced associated tools: random_mer, for the detection of random N bases either on one or both termini of the reads, and fastqc_parser, for consolidating the results from FASTQC outputs. Together, this is a valuable tool set for metadata analyses on multiple sequencing datasets. MDPI 2020-06-08 /pmc/articles/PMC7356586/ /pubmed/32521604 http://dx.doi.org/10.3390/biom10060878 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Mohideen, Asan M.S.H. Johansen, Steinar D. Babiak, Igor High-Throughput Identification of Adapters in Single-Read Sequencing Data
title	High-Throughput Identification of Adapters in Single-Read Sequencing Data
title_full	High-Throughput Identification of Adapters in Single-Read Sequencing Data
title_fullStr	High-Throughput Identification of Adapters in Single-Read Sequencing Data
title_full_unstemmed	High-Throughput Identification of Adapters in Single-Read Sequencing Data
title_short	High-Throughput Identification of Adapters in Single-Read Sequencing Data
title_sort	high-throughput identification of adapters in single-read sequencing data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7356586/ https://www.ncbi.nlm.nih.gov/pubmed/32521604 http://dx.doi.org/10.3390/biom10060878
work_keys_str_mv	AT mohideenasanmsh highthroughputidentificationofadaptersinsinglereadsequencingdata AT johansensteinard highthroughputidentificationofadaptersinsinglereadsequencingdata AT babiakigor highthroughputidentificationofadaptersinsinglereadsequencingdata

High-Throughput Identification of Adapters in Single-Read Sequencing Data

Ejemplares similares