Cargando…

AnnotationBustR: an R package to extract subsequences from GenBank annotations

BACKGROUND: DNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequence records on GenBank contain more than one gene or are portions of genomes. Incon...

Descripción completa

Detalles Bibliográficos
Autores principales:	Borstein, Samuel R., O’Meara, Brian C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2018
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034590/ https://www.ncbi.nlm.nih.gov/pubmed/30002984 http://dx.doi.org/10.7717/peerj.5179

_version_	1783337911239835648
author	Borstein, Samuel R. O’Meara, Brian C.
author_facet	Borstein, Samuel R. O’Meara, Brian C.
author_sort	Borstein, Samuel R.
collection	PubMed
description	BACKGROUND: DNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequence records on GenBank contain more than one gene or are portions of genomes. Inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from many GenBank accessions based on annotations where gene names may vary extensively. RESULTS: The R package AnnotationBustR allows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers. AnnotationBustR extracts subsequences of interest and then writes them to a FASTA file for users to employ in their research endeavors. CONCLUSION: FASTA files of extracted subsequences and accession tables generated by AnnotationBustR allow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses.
format	Online Article Text
id	pubmed-6034590
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-60345902018-07-12 AnnotationBustR: an R package to extract subsequences from GenBank annotations Borstein, Samuel R. O’Meara, Brian C. PeerJ Bioinformatics BACKGROUND: DNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequence records on GenBank contain more than one gene or are portions of genomes. Inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from many GenBank accessions based on annotations where gene names may vary extensively. RESULTS: The R package AnnotationBustR allows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers. AnnotationBustR extracts subsequences of interest and then writes them to a FASTA file for users to employ in their research endeavors. CONCLUSION: FASTA files of extracted subsequences and accession tables generated by AnnotationBustR allow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses. PeerJ Inc. 2018-07-03 /pmc/articles/PMC6034590/ /pubmed/30002984 http://dx.doi.org/10.7717/peerj.5179 Text en © 2018 Borstein and O’Meara http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Borstein, Samuel R. O’Meara, Brian C. AnnotationBustR: an R package to extract subsequences from GenBank annotations
title	AnnotationBustR: an R package to extract subsequences from GenBank annotations
title_full	AnnotationBustR: an R package to extract subsequences from GenBank annotations
title_fullStr	AnnotationBustR: an R package to extract subsequences from GenBank annotations
title_full_unstemmed	AnnotationBustR: an R package to extract subsequences from GenBank annotations
title_short	AnnotationBustR: an R package to extract subsequences from GenBank annotations
title_sort	annotationbustr: an r package to extract subsequences from genbank annotations
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034590/ https://www.ncbi.nlm.nih.gov/pubmed/30002984 http://dx.doi.org/10.7717/peerj.5179
work_keys_str_mv	AT borsteinsamuelr annotationbustranrpackagetoextractsubsequencesfromgenbankannotations AT omearabrianc annotationbustranrpackagetoextractsubsequencesfromgenbankannotations

AnnotationBustR: an R package to extract subsequences from GenBank annotations

Ejemplares similares