Cargando…

GAAP: A Genome Assembly + Annotation Pipeline

Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Kong, Jinhwa, Huh, Sun, Won, Jung-Im, Yoon, Jeehee, Kim, Baeksop, Kim, Kiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617929/
https://www.ncbi.nlm.nih.gov/pubmed/31346518
http://dx.doi.org/10.1155/2019/4767354
_version_ 1783433804545785856
author Kong, Jinhwa
Huh, Sun
Won, Jung-Im
Yoon, Jeehee
Kim, Baeksop
Kim, Kiyong
author_facet Kong, Jinhwa
Huh, Sun
Won, Jung-Im
Yoon, Jeehee
Kim, Baeksop
Kim, Kiyong
author_sort Kong, Jinhwa
collection PubMed
description Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.
format Online
Article
Text
id pubmed-6617929
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-66179292019-07-25 GAAP: A Genome Assembly + Annotation Pipeline Kong, Jinhwa Huh, Sun Won, Jung-Im Yoon, Jeehee Kim, Baeksop Kim, Kiyong Biomed Res Int Research Article Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method. Hindawi 2019-06-26 /pmc/articles/PMC6617929/ /pubmed/31346518 http://dx.doi.org/10.1155/2019/4767354 Text en Copyright © 2019 Jinhwa Kong et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kong, Jinhwa
Huh, Sun
Won, Jung-Im
Yoon, Jeehee
Kim, Baeksop
Kim, Kiyong
GAAP: A Genome Assembly + Annotation Pipeline
title GAAP: A Genome Assembly + Annotation Pipeline
title_full GAAP: A Genome Assembly + Annotation Pipeline
title_fullStr GAAP: A Genome Assembly + Annotation Pipeline
title_full_unstemmed GAAP: A Genome Assembly + Annotation Pipeline
title_short GAAP: A Genome Assembly + Annotation Pipeline
title_sort gaap: a genome assembly + annotation pipeline
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617929/
https://www.ncbi.nlm.nih.gov/pubmed/31346518
http://dx.doi.org/10.1155/2019/4767354
work_keys_str_mv AT kongjinhwa gaapagenomeassemblyannotationpipeline
AT huhsun gaapagenomeassemblyannotationpipeline
AT wonjungim gaapagenomeassemblyannotationpipeline
AT yoonjeehee gaapagenomeassemblyannotationpipeline
AT kimbaeksop gaapagenomeassemblyannotationpipeline
AT kimkiyong gaapagenomeassemblyannotationpipeline