Cargando…
GAAP: A Genome Assembly + Annotation Pipeline
Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these ge...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617929/ https://www.ncbi.nlm.nih.gov/pubmed/31346518 http://dx.doi.org/10.1155/2019/4767354 |
_version_ | 1783433804545785856 |
---|---|
author | Kong, Jinhwa Huh, Sun Won, Jung-Im Yoon, Jeehee Kim, Baeksop Kim, Kiyong |
author_facet | Kong, Jinhwa Huh, Sun Won, Jung-Im Yoon, Jeehee Kim, Baeksop Kim, Kiyong |
author_sort | Kong, Jinhwa |
collection | PubMed |
description | Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method. |
format | Online Article Text |
id | pubmed-6617929 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-66179292019-07-25 GAAP: A Genome Assembly + Annotation Pipeline Kong, Jinhwa Huh, Sun Won, Jung-Im Yoon, Jeehee Kim, Baeksop Kim, Kiyong Biomed Res Int Research Article Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method. Hindawi 2019-06-26 /pmc/articles/PMC6617929/ /pubmed/31346518 http://dx.doi.org/10.1155/2019/4767354 Text en Copyright © 2019 Jinhwa Kong et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Kong, Jinhwa Huh, Sun Won, Jung-Im Yoon, Jeehee Kim, Baeksop Kim, Kiyong GAAP: A Genome Assembly + Annotation Pipeline |
title | GAAP: A Genome Assembly + Annotation Pipeline |
title_full | GAAP: A Genome Assembly + Annotation Pipeline |
title_fullStr | GAAP: A Genome Assembly + Annotation Pipeline |
title_full_unstemmed | GAAP: A Genome Assembly + Annotation Pipeline |
title_short | GAAP: A Genome Assembly + Annotation Pipeline |
title_sort | gaap: a genome assembly + annotation pipeline |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617929/ https://www.ncbi.nlm.nih.gov/pubmed/31346518 http://dx.doi.org/10.1155/2019/4767354 |
work_keys_str_mv | AT kongjinhwa gaapagenomeassemblyannotationpipeline AT huhsun gaapagenomeassemblyannotationpipeline AT wonjungim gaapagenomeassemblyannotationpipeline AT yoonjeehee gaapagenomeassemblyannotationpipeline AT kimbaeksop gaapagenomeassemblyannotationpipeline AT kimkiyong gaapagenomeassemblyannotationpipeline |