Cargando…

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, an...

Descripción completa

Detalles Bibliográficos
Autores principales: Cantacessi, Cinzia, Jex, Aaron R., Hall, Ross S., Young, Neil D., Campbell, Bronwyn E., Joachim, Anja, Nolan, Matthew J., Abubucker, Sahar, Sternberg, Paul W., Ranganathan, Shoba, Mitreva, Makedonka, Gasser, Robin B.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943614/
https://www.ncbi.nlm.nih.gov/pubmed/20682560
http://dx.doi.org/10.1093/nar/gkq667
_version_ 1782187035699707904
author Cantacessi, Cinzia
Jex, Aaron R.
Hall, Ross S.
Young, Neil D.
Campbell, Bronwyn E.
Joachim, Anja
Nolan, Matthew J.
Abubucker, Sahar
Sternberg, Paul W.
Ranganathan, Shoba
Mitreva, Makedonka
Gasser, Robin B.
author_facet Cantacessi, Cinzia
Jex, Aaron R.
Hall, Ross S.
Young, Neil D.
Campbell, Bronwyn E.
Joachim, Anja
Nolan, Matthew J.
Abubucker, Sahar
Sternberg, Paul W.
Ranganathan, Shoba
Mitreva, Makedonka
Gasser, Robin B.
author_sort Cantacessi, Cinzia
collection PubMed
description Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.
format Text
id pubmed-2943614
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29436142010-09-22 A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing Cantacessi, Cinzia Jex, Aaron R. Hall, Ross S. Young, Neil D. Campbell, Bronwyn E. Joachim, Anja Nolan, Matthew J. Abubucker, Sahar Sternberg, Paul W. Ranganathan, Shoba Mitreva, Makedonka Gasser, Robin B. Nucleic Acids Res Methods Online Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism. Oxford University Press 2010-09 2010-08-03 /pmc/articles/PMC2943614/ /pubmed/20682560 http://dx.doi.org/10.1093/nar/gkq667 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Cantacessi, Cinzia
Jex, Aaron R.
Hall, Ross S.
Young, Neil D.
Campbell, Bronwyn E.
Joachim, Anja
Nolan, Matthew J.
Abubucker, Sahar
Sternberg, Paul W.
Ranganathan, Shoba
Mitreva, Makedonka
Gasser, Robin B.
A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
title A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
title_full A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
title_fullStr A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
title_full_unstemmed A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
title_short A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
title_sort practical, bioinformatic workflow system for large data sets generated by next-generation sequencing
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943614/
https://www.ncbi.nlm.nih.gov/pubmed/20682560
http://dx.doi.org/10.1093/nar/gkq667
work_keys_str_mv AT cantacessicinzia apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT jexaaronr apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT hallrosss apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT youngneild apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT campbellbronwyne apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT joachimanja apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT nolanmatthewj apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT abubuckersahar apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT sternbergpaulw apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT ranganathanshoba apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT mitrevamakedonka apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT gasserrobinb apracticalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT cantacessicinzia practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT jexaaronr practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT hallrosss practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT youngneild practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT campbellbronwyne practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT joachimanja practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT nolanmatthewj practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT abubuckersahar practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT sternbergpaulw practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT ranganathanshoba practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT mitrevamakedonka practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing
AT gasserrobinb practicalbioinformaticworkflowsystemforlargedatasetsgeneratedbynextgenerationsequencing