Cargando…

Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies

High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we...

Descripción completa

Detalles Bibliográficos
Autores principales: Delmont, Tom O., Eren, A. Murat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4824900/
https://www.ncbi.nlm.nih.gov/pubmed/27069789
http://dx.doi.org/10.7717/peerj.1839
_version_ 1782426150595723264
author Delmont, Tom O.
Eren, A. Murat
author_facet Delmont, Tom O.
Eren, A. Murat
author_sort Delmont, Tom O.
collection PubMed
description High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes.
format Online
Article
Text
id pubmed-4824900
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-48249002016-04-11 Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies Delmont, Tom O. Eren, A. Murat PeerJ Bioinformatics High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes. PeerJ Inc. 2016-03-29 /pmc/articles/PMC4824900/ /pubmed/27069789 http://dx.doi.org/10.7717/peerj.1839 Text en ©2016 Delmont and Eren http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Delmont, Tom O.
Eren, A. Murat
Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
title Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
title_full Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
title_fullStr Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
title_full_unstemmed Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
title_short Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
title_sort identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4824900/
https://www.ncbi.nlm.nih.gov/pubmed/27069789
http://dx.doi.org/10.7717/peerj.1839
work_keys_str_mv AT delmonttomo identifyingcontaminationwithadvancedvisualizationandanalysispracticesmetagenomicapproachesforeukaryoticgenomeassemblies
AT erenamurat identifyingcontaminationwithadvancedvisualizationandanalysispracticesmetagenomicapproachesforeukaryoticgenomeassemblies