Cargando…

Additive methods for genomic signatures

BACKGROUND: Studies exploring the potential of Chaos Game Representations (CGR) of genomic sequences to act as “genomic signatures” (to be species- and genome-specific) showed that CGR patterns of nuclear and organellar DNA sequences of the same organism can be very different. While the hypothesis t...

Descripción completa

Detalles Bibliográficos
Autores principales: Karamichalis, Rallis, Kari, Lila, Konstantinidis, Stavros, Kopecki, Steffen, Solis-Reyes, Stephen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4994249/
https://www.ncbi.nlm.nih.gov/pubmed/27549194
http://dx.doi.org/10.1186/s12859-016-1157-8
_version_ 1782449289473032192
author Karamichalis, Rallis
Kari, Lila
Konstantinidis, Stavros
Kopecki, Steffen
Solis-Reyes, Stephen
author_facet Karamichalis, Rallis
Kari, Lila
Konstantinidis, Stavros
Kopecki, Steffen
Solis-Reyes, Stephen
author_sort Karamichalis, Rallis
collection PubMed
description BACKGROUND: Studies exploring the potential of Chaos Game Representations (CGR) of genomic sequences to act as “genomic signatures” (to be species- and genome-specific) showed that CGR patterns of nuclear and organellar DNA sequences of the same organism can be very different. While the hypothesis that CGRs of mitochondrial DNA sequences can act as genomic signatures was validated for a snapshot of all sequenced mitochondrial genomes available in the NCBI GenBank sequence database, to our knowledge no such extensive analysis of CGRs of nuclear DNA sequences exists to date. RESULTS: We analyzed an extensive dataset, totalling 1.45 gigabase pairs, of nuclear/nucleoid genomic sequences (nDNA) from 42 different organisms, spanning all major kingdoms of life. Our computational experiments indicate that CGR signatures of nDNA of two different origins cannot always be differentiated, especially if they originate from closely-related species such as H. sapiens and P. troglodytes or E. coli and E. fergusonii. To address this issue, we propose the general concept of additive DNA signature of a set (collection) of DNA sequences. One particular instance, the composite DNA signature, combines information from nDNA fragments and organellar (mitochondrial, chloroplast, or plasmid) genomes. We demonstrate that, in this dataset, composite DNA signatures originating from two different organisms can be differentiated in all cases, including those where the use of CGR signatures of nDNA failed or was inconclusive. Another instance, the assembled DNA signature, combines information from many short DNA subfragments (e.g., 100 basepairs) of a given DNA fragment, to produce its signature. We show that an assembled DNA signature has the same distinguishing power as a conventionally computed CGR signature, while using shorter contiguous sequences and potentially less sequence information. CONCLUSIONS: Our results suggest that, while CGR signatures of nDNA cannot always play the role of genomic signatures, composite and assembled DNA signatures (separately or in combination) could potentially be used instead. Such additive signatures could be used, e.g., with raw unassembled next-generation sequencing (NGS) read data, when high-quality sequencing data is not available, or to complement information obtained by other methods of species identification or classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1157-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4994249
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49942492016-08-31 Additive methods for genomic signatures Karamichalis, Rallis Kari, Lila Konstantinidis, Stavros Kopecki, Steffen Solis-Reyes, Stephen BMC Bioinformatics Research Article BACKGROUND: Studies exploring the potential of Chaos Game Representations (CGR) of genomic sequences to act as “genomic signatures” (to be species- and genome-specific) showed that CGR patterns of nuclear and organellar DNA sequences of the same organism can be very different. While the hypothesis that CGRs of mitochondrial DNA sequences can act as genomic signatures was validated for a snapshot of all sequenced mitochondrial genomes available in the NCBI GenBank sequence database, to our knowledge no such extensive analysis of CGRs of nuclear DNA sequences exists to date. RESULTS: We analyzed an extensive dataset, totalling 1.45 gigabase pairs, of nuclear/nucleoid genomic sequences (nDNA) from 42 different organisms, spanning all major kingdoms of life. Our computational experiments indicate that CGR signatures of nDNA of two different origins cannot always be differentiated, especially if they originate from closely-related species such as H. sapiens and P. troglodytes or E. coli and E. fergusonii. To address this issue, we propose the general concept of additive DNA signature of a set (collection) of DNA sequences. One particular instance, the composite DNA signature, combines information from nDNA fragments and organellar (mitochondrial, chloroplast, or plasmid) genomes. We demonstrate that, in this dataset, composite DNA signatures originating from two different organisms can be differentiated in all cases, including those where the use of CGR signatures of nDNA failed or was inconclusive. Another instance, the assembled DNA signature, combines information from many short DNA subfragments (e.g., 100 basepairs) of a given DNA fragment, to produce its signature. We show that an assembled DNA signature has the same distinguishing power as a conventionally computed CGR signature, while using shorter contiguous sequences and potentially less sequence information. CONCLUSIONS: Our results suggest that, while CGR signatures of nDNA cannot always play the role of genomic signatures, composite and assembled DNA signatures (separately or in combination) could potentially be used instead. Such additive signatures could be used, e.g., with raw unassembled next-generation sequencing (NGS) read data, when high-quality sequencing data is not available, or to complement information obtained by other methods of species identification or classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1157-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-22 /pmc/articles/PMC4994249/ /pubmed/27549194 http://dx.doi.org/10.1186/s12859-016-1157-8 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Karamichalis, Rallis
Kari, Lila
Konstantinidis, Stavros
Kopecki, Steffen
Solis-Reyes, Stephen
Additive methods for genomic signatures
title Additive methods for genomic signatures
title_full Additive methods for genomic signatures
title_fullStr Additive methods for genomic signatures
title_full_unstemmed Additive methods for genomic signatures
title_short Additive methods for genomic signatures
title_sort additive methods for genomic signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4994249/
https://www.ncbi.nlm.nih.gov/pubmed/27549194
http://dx.doi.org/10.1186/s12859-016-1157-8
work_keys_str_mv AT karamichalisrallis additivemethodsforgenomicsignatures
AT karilila additivemethodsforgenomicsignatures
AT konstantinidisstavros additivemethodsforgenomicsignatures
AT kopeckisteffen additivemethodsforgenomicsignatures
AT solisreyesstephen additivemethodsforgenomicsignatures