Cargando…
Investigation into the annotation of protocol sequencing steps in the sequence read archive
BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4425880/ https://www.ncbi.nlm.nih.gov/pubmed/25960871 http://dx.doi.org/10.1186/s13742-015-0064-7 |
_version_ | 1782370535641972736 |
---|---|
author | Alnasir, Jamie Shanahan, Hugh P |
author_facet | Alnasir, Jamie Shanahan, Hugh P |
author_sort | Alnasir, Jamie |
collection | PubMed |
description | BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0064-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4425880 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44258802015-05-10 Investigation into the annotation of protocol sequencing steps in the sequence read archive Alnasir, Jamie Shanahan, Hugh P Gigascience Research BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0064-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-09 /pmc/articles/PMC4425880/ /pubmed/25960871 http://dx.doi.org/10.1186/s13742-015-0064-7 Text en © Alnasir and Shanahan; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Alnasir, Jamie Shanahan, Hugh P Investigation into the annotation of protocol sequencing steps in the sequence read archive |
title | Investigation into the annotation of protocol sequencing steps in the sequence read archive |
title_full | Investigation into the annotation of protocol sequencing steps in the sequence read archive |
title_fullStr | Investigation into the annotation of protocol sequencing steps in the sequence read archive |
title_full_unstemmed | Investigation into the annotation of protocol sequencing steps in the sequence read archive |
title_short | Investigation into the annotation of protocol sequencing steps in the sequence read archive |
title_sort | investigation into the annotation of protocol sequencing steps in the sequence read archive |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4425880/ https://www.ncbi.nlm.nih.gov/pubmed/25960871 http://dx.doi.org/10.1186/s13742-015-0064-7 |
work_keys_str_mv | AT alnasirjamie investigationintotheannotationofprotocolsequencingstepsinthesequencereadarchive AT shanahanhughp investigationintotheannotationofprotocolsequencingstepsinthesequencereadarchive |