Cargando…

OrfM: a fast open reading frame predictor for metagenomic data

Summary: Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational...

Descripción completa

Detalles Bibliográficos
Autores principales:	Woodcroft, Ben J., Boyd, Joel A., Tyson, Gene W.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Applications Notes
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013905/ https://www.ncbi.nlm.nih.gov/pubmed/27153669 http://dx.doi.org/10.1093/bioinformatics/btw241

_version_	1782452236370051072
author	Woodcroft, Ben J. Boyd, Joel A. Tyson, Gene W.
author_facet	Woodcroft, Ben J. Boyd, Joel A. Tyson, Gene W.
author_sort	Woodcroft, Ben J.
collection	PubMed
description	Summary: Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here, we present OrfM, a tool to rapidly identify open reading frames (ORFs) in sequence data by applying the Aho–Corasick algorithm to find regions uninterrupted by stop codons. Benchmarking revealed that OrfM finds identical ORFs to similar tools (‘GetOrf’ and ‘Translate’) but is four-five times faster. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers. Availability and Implementation: Source code and binaries are freely available for download at http://github.com/wwood/OrfM or through GNU Guix under the LGPL 3+ license. OrfM is implemented in C and supported on GNU/Linux and OSX. Contacts: b.woodcroft@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-5013905
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-50139052016-09-12 OrfM: a fast open reading frame predictor for metagenomic data Woodcroft, Ben J. Boyd, Joel A. Tyson, Gene W. Bioinformatics Applications Notes Summary: Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here, we present OrfM, a tool to rapidly identify open reading frames (ORFs) in sequence data by applying the Aho–Corasick algorithm to find regions uninterrupted by stop codons. Benchmarking revealed that OrfM finds identical ORFs to similar tools (‘GetOrf’ and ‘Translate’) but is four-five times faster. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers. Availability and Implementation: Source code and binaries are freely available for download at http://github.com/wwood/OrfM or through GNU Guix under the LGPL 3+ license. OrfM is implemented in C and supported on GNU/Linux and OSX. Contacts: b.woodcroft@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-09-01 2016-05-03 /pmc/articles/PMC5013905/ /pubmed/27153669 http://dx.doi.org/10.1093/bioinformatics/btw241 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Notes Woodcroft, Ben J. Boyd, Joel A. Tyson, Gene W. OrfM: a fast open reading frame predictor for metagenomic data
title	OrfM: a fast open reading frame predictor for metagenomic data
title_full	OrfM: a fast open reading frame predictor for metagenomic data
title_fullStr	OrfM: a fast open reading frame predictor for metagenomic data
title_full_unstemmed	OrfM: a fast open reading frame predictor for metagenomic data
title_short	OrfM: a fast open reading frame predictor for metagenomic data
title_sort	orfm: a fast open reading frame predictor for metagenomic data
topic	Applications Notes
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013905/ https://www.ncbi.nlm.nih.gov/pubmed/27153669 http://dx.doi.org/10.1093/bioinformatics/btw241
work_keys_str_mv	AT woodcroftbenj orfmafastopenreadingframepredictorformetagenomicdata AT boydjoela orfmafastopenreadingframepredictorformetagenomicdata AT tysongenew orfmafastopenreadingframepredictorformetagenomicdata

OrfM: a fast open reading frame predictor for metagenomic data

Ejemplares similares