Cargando…

Mapping the Space of Genomic Signatures

We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of D...

Descripción completa

Detalles Bibliográficos
Autores principales: Kari, Lila, Hill, Kathleen A., Sayem, Abu S., Karamichalis, Rallis, Bryans, Nathaniel, Davis, Katelyn, Dattani, Nikesh S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4441465/
https://www.ncbi.nlm.nih.gov/pubmed/26000734
http://dx.doi.org/10.1371/journal.pone.0119815
_version_ 1782372796928622592
author Kari, Lila
Hill, Kathleen A.
Sayem, Abu S.
Karamichalis, Rallis
Bryans, Nathaniel
Davis, Katelyn
Dattani, Nikesh S.
author_facet Kari, Lila
Hill, Kathleen A.
Sayem, Abu S.
Karamichalis, Rallis
Bryans, Nathaniel
Davis, Katelyn
Dattani, Nikesh S.
author_sort Kari, Lila
collection PubMed
description We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.
format Online
Article
Text
id pubmed-4441465
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44414652015-05-28 Mapping the Space of Genomic Signatures Kari, Lila Hill, Kathleen A. Sayem, Abu S. Karamichalis, Rallis Bryans, Nathaniel Davis, Katelyn Dattani, Nikesh S. PLoS One Research Article We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber. Public Library of Science 2015-05-22 /pmc/articles/PMC4441465/ /pubmed/26000734 http://dx.doi.org/10.1371/journal.pone.0119815 Text en © 2015 Kari et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kari, Lila
Hill, Kathleen A.
Sayem, Abu S.
Karamichalis, Rallis
Bryans, Nathaniel
Davis, Katelyn
Dattani, Nikesh S.
Mapping the Space of Genomic Signatures
title Mapping the Space of Genomic Signatures
title_full Mapping the Space of Genomic Signatures
title_fullStr Mapping the Space of Genomic Signatures
title_full_unstemmed Mapping the Space of Genomic Signatures
title_short Mapping the Space of Genomic Signatures
title_sort mapping the space of genomic signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4441465/
https://www.ncbi.nlm.nih.gov/pubmed/26000734
http://dx.doi.org/10.1371/journal.pone.0119815
work_keys_str_mv AT karilila mappingthespaceofgenomicsignatures
AT hillkathleena mappingthespaceofgenomicsignatures
AT sayemabus mappingthespaceofgenomicsignatures
AT karamichalisrallis mappingthespaceofgenomicsignatures
AT bryansnathaniel mappingthespaceofgenomicsignatures
AT daviskatelyn mappingthespaceofgenomicsignatures
AT dattaninikeshs mappingthespaceofgenomicsignatures