Cargando…

Investigation into the annotation of protocol sequencing steps in the sequence read archive

BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Alnasir, Jamie, Shanahan, Hugh P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4425880/
https://www.ncbi.nlm.nih.gov/pubmed/25960871
http://dx.doi.org/10.1186/s13742-015-0064-7
_version_ 1782370535641972736
author Alnasir, Jamie
Shanahan, Hugh P
author_facet Alnasir, Jamie
Shanahan, Hugh P
author_sort Alnasir, Jamie
collection PubMed
description BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0064-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4425880
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44258802015-05-10 Investigation into the annotation of protocol sequencing steps in the sequence read archive Alnasir, Jamie Shanahan, Hugh P Gigascience Research BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0064-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-09 /pmc/articles/PMC4425880/ /pubmed/25960871 http://dx.doi.org/10.1186/s13742-015-0064-7 Text en © Alnasir and Shanahan; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Alnasir, Jamie
Shanahan, Hugh P
Investigation into the annotation of protocol sequencing steps in the sequence read archive
title Investigation into the annotation of protocol sequencing steps in the sequence read archive
title_full Investigation into the annotation of protocol sequencing steps in the sequence read archive
title_fullStr Investigation into the annotation of protocol sequencing steps in the sequence read archive
title_full_unstemmed Investigation into the annotation of protocol sequencing steps in the sequence read archive
title_short Investigation into the annotation of protocol sequencing steps in the sequence read archive
title_sort investigation into the annotation of protocol sequencing steps in the sequence read archive
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4425880/
https://www.ncbi.nlm.nih.gov/pubmed/25960871
http://dx.doi.org/10.1186/s13742-015-0064-7
work_keys_str_mv AT alnasirjamie investigationintotheannotationofprotocolsequencingstepsinthesequencereadarchive
AT shanahanhughp investigationintotheannotationofprotocolsequencingstepsinthesequencereadarchive