Cargando…
Librarian: A quality control tool to analyse sequencing library compositions
Background: Robust analysis of DNA sequencing data needs to include a set of quality control steps to ensure that technical bias is kept to a minimum. A metric easily obtained is the frequency of each of the nucleobases for each position across all sequencing reads. Here, we explore the differences...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9579741/ https://www.ncbi.nlm.nih.gov/pubmed/36300036 http://dx.doi.org/10.12688/f1000research.125325.1 |
_version_ | 1784812249770098688 |
---|---|
author | Vashishtha, Kartavya Gaud, Caroline Andrews, Simon Krueger, Christel |
author_facet | Vashishtha, Kartavya Gaud, Caroline Andrews, Simon Krueger, Christel |
author_sort | Vashishtha, Kartavya |
collection | PubMed |
description | Background: Robust analysis of DNA sequencing data needs to include a set of quality control steps to ensure that technical bias is kept to a minimum. A metric easily obtained is the frequency of each of the nucleobases for each position across all sequencing reads. Here, we explore the differences in nucleobase compositions of various library types produced by standard experimental methodologies. Methods: We obtained the compositions of nearly 3000 publicly available datasets and subjected them to Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction for a two-dimensional representation of their composition characteristics. Results: We find that most library types result in a specific composition profile. We use this to give an estimate of how strongly the composition of a test library resembles the profiles of previously published libraries, and how likely the test sample is to be of a particular type. We introduce Librarian, a user-friendly web application and command line tool which enables checking base compositions of test libraries against known library types. Conclusions: Library preparation methods strongly influence the per position nucleobase content. By comparing test libraries to a database of previously published library types we can make predictions regarding the library preparation method. Librarian is a user-friendly tool to access this information for quality assurance purposes as discrepancies can flag potential irregularities very early on. |
format | Online Article Text |
id | pubmed-9579741 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-95797412022-10-25 Librarian: A quality control tool to analyse sequencing library compositions Vashishtha, Kartavya Gaud, Caroline Andrews, Simon Krueger, Christel F1000Res Software Tool Article Background: Robust analysis of DNA sequencing data needs to include a set of quality control steps to ensure that technical bias is kept to a minimum. A metric easily obtained is the frequency of each of the nucleobases for each position across all sequencing reads. Here, we explore the differences in nucleobase compositions of various library types produced by standard experimental methodologies. Methods: We obtained the compositions of nearly 3000 publicly available datasets and subjected them to Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction for a two-dimensional representation of their composition characteristics. Results: We find that most library types result in a specific composition profile. We use this to give an estimate of how strongly the composition of a test library resembles the profiles of previously published libraries, and how likely the test sample is to be of a particular type. We introduce Librarian, a user-friendly web application and command line tool which enables checking base compositions of test libraries against known library types. Conclusions: Library preparation methods strongly influence the per position nucleobase content. By comparing test libraries to a database of previously published library types we can make predictions regarding the library preparation method. Librarian is a user-friendly tool to access this information for quality assurance purposes as discrepancies can flag potential irregularities very early on. F1000 Research Limited 2022-09-29 /pmc/articles/PMC9579741/ /pubmed/36300036 http://dx.doi.org/10.12688/f1000research.125325.1 Text en Copyright: © 2022 Vashishtha K et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Tool Article Vashishtha, Kartavya Gaud, Caroline Andrews, Simon Krueger, Christel Librarian: A quality control tool to analyse sequencing library compositions |
title | Librarian: A quality control tool to analyse sequencing library compositions |
title_full | Librarian: A quality control tool to analyse sequencing library compositions |
title_fullStr | Librarian: A quality control tool to analyse sequencing library compositions |
title_full_unstemmed | Librarian: A quality control tool to analyse sequencing library compositions |
title_short | Librarian: A quality control tool to analyse sequencing library compositions |
title_sort | librarian: a quality control tool to analyse sequencing library compositions |
topic | Software Tool Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9579741/ https://www.ncbi.nlm.nih.gov/pubmed/36300036 http://dx.doi.org/10.12688/f1000research.125325.1 |
work_keys_str_mv | AT vashishthakartavya librarianaqualitycontroltooltoanalysesequencinglibrarycompositions AT gaudcaroline librarianaqualitycontroltooltoanalysesequencinglibrarycompositions AT andrewssimon librarianaqualitycontroltooltoanalysesequencinglibrarycompositions AT kruegerchristel librarianaqualitycontroltooltoanalysesequencinglibrarycompositions |