Cargando…

TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data

BACKGROUND: Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarver, Aaron L, Erdman, Jesse, Starr, Tim, Largaespada, David A, Silverstein, Kevin A T
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3461456/
https://www.ncbi.nlm.nih.gov/pubmed/22748055
http://dx.doi.org/10.1186/1471-2105-13-154
_version_ 1782245079876894720
author Sarver, Aaron L
Erdman, Jesse
Starr, Tim
Largaespada, David A
Silverstein, Kevin A T
author_facet Sarver, Aaron L
Erdman, Jesse
Starr, Tim
Largaespada, David A
Silverstein, Kevin A T
author_sort Sarver, Aaron L
collection PubMed
description BACKGROUND: Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1) map junction fragments within the genome and 2) identify Common Insertion Sites (CISs) within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation. RESULTS: We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE) software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher’s exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data. CONCLUSIONS: The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches, provides consistent results at a wide range of sequence sampling depth, has the capability of handling extremely large datasets, enables meaningful comparison across datasets and enables large scale meta-analyses of junction fragment data. The TAPDANCE software will greatly enhance our ability to analyze these datasets in order to increase our understanding of the genetic basis of cancers.
format Online
Article
Text
id pubmed-3461456
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34614562012-10-02 TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data Sarver, Aaron L Erdman, Jesse Starr, Tim Largaespada, David A Silverstein, Kevin A T BMC Bioinformatics Software BACKGROUND: Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1) map junction fragments within the genome and 2) identify Common Insertion Sites (CISs) within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation. RESULTS: We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE) software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher’s exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data. CONCLUSIONS: The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches, provides consistent results at a wide range of sequence sampling depth, has the capability of handling extremely large datasets, enables meaningful comparison across datasets and enables large scale meta-analyses of junction fragment data. The TAPDANCE software will greatly enhance our ability to analyze these datasets in order to increase our understanding of the genetic basis of cancers. BioMed Central 2012-06-29 /pmc/articles/PMC3461456/ /pubmed/22748055 http://dx.doi.org/10.1186/1471-2105-13-154 Text en Copyright ©2012 Sarver et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Sarver, Aaron L
Erdman, Jesse
Starr, Tim
Largaespada, David A
Silverstein, Kevin A T
TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
title TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
title_full TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
title_fullStr TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
title_full_unstemmed TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
title_short TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data
title_sort tapdance: an automated tool to identify and annotate transposon insertion ciss and associations between ciss from next generation sequence data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3461456/
https://www.ncbi.nlm.nih.gov/pubmed/22748055
http://dx.doi.org/10.1186/1471-2105-13-154
work_keys_str_mv AT sarveraaronl tapdanceanautomatedtooltoidentifyandannotatetransposoninsertioncissandassociationsbetweencissfromnextgenerationsequencedata
AT erdmanjesse tapdanceanautomatedtooltoidentifyandannotatetransposoninsertioncissandassociationsbetweencissfromnextgenerationsequencedata
AT starrtim tapdanceanautomatedtooltoidentifyandannotatetransposoninsertioncissandassociationsbetweencissfromnextgenerationsequencedata
AT largaespadadavida tapdanceanautomatedtooltoidentifyandannotatetransposoninsertioncissandassociationsbetweencissfromnextgenerationsequencedata
AT silversteinkevinat tapdanceanautomatedtooltoidentifyandannotatetransposoninsertioncissandassociationsbetweencissfromnextgenerationsequencedata