Cargando…

Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

BACKGROUND: The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly,...

Descripción completa

Detalles Bibliográficos
Autores principales: Parker, Nicolas J, Parker, Andrew G
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374781/
https://www.ncbi.nlm.nih.gov/pubmed/18423012
http://dx.doi.org/10.1186/1751-0473-3-5
_version_ 1782154525630529536
author Parker, Nicolas J
Parker, Andrew G
author_facet Parker, Nicolas J
Parker, Andrew G
author_sort Parker, Nicolas J
collection PubMed
description BACKGROUND: The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. METHODS: A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. RESULTS: Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. CONCLUSION: The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.
format Text
id pubmed-2374781
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23747812008-05-09 Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data Parker, Nicolas J Parker, Andrew G Source Code Biol Med Methodology BACKGROUND: The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. METHODS: A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. RESULTS: Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. CONCLUSION: The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information. BioMed Central 2008-04-18 /pmc/articles/PMC2374781/ /pubmed/18423012 http://dx.doi.org/10.1186/1751-0473-3-5 Text en Copyright © 2008 Parker and Parker; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Parker, Nicolas J
Parker, Andrew G
Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
title Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
title_full Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
title_fullStr Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
title_full_unstemmed Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
title_short Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
title_sort simple tools for assembling and searching high-density picolitre pyrophosphate sequence data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374781/
https://www.ncbi.nlm.nih.gov/pubmed/18423012
http://dx.doi.org/10.1186/1751-0473-3-5
work_keys_str_mv AT parkernicolasj simpletoolsforassemblingandsearchinghighdensitypicolitrepyrophosphatesequencedata
AT parkerandrewg simpletoolsforassemblingandsearchinghighdensitypicolitrepyrophosphatesequencedata