Cargando…
DNA Sequences at a Glance
Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital repres...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836782/ https://www.ncbi.nlm.nih.gov/pubmed/24278218 http://dx.doi.org/10.1371/journal.pone.0079922 |
_version_ | 1782292348953165824 |
---|---|
author | Pinho, Armando J. Garcia, Sara P. Pratas, Diogo Ferreira, Paulo J. S. G. |
author_facet | Pinho, Armando J. Garcia, Sara P. Pratas, Diogo Ferreira, Paulo J. S. G. |
author_sort | Pinho, Armando J. |
collection | PubMed |
description | Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences. Genomic data sets are growing rapidly, making their analysis increasingly more difficult, and raising the need for new, scalable tools. For example, being able to look at very large DNA sequences while immediately identifying potentially interesting regions would provide the biologist with a flexible exploratory and analytical tool. In this paper we present a new concept, the “information profile”, which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The computation of the information profiles is computationally tractable: we show that it can be done in time proportional to the length of the sequence. We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h(−) and five human chromosomes 22 for illustration. We show that information profiles are useful for detecting large-scale genomic regularities by visual inspection. Several discovery strategies are possible, including the standalone analysis of single sequences, the comparative analysis of sequences from individuals from the same species, and the comparative analysis of sequences from different organisms. The comparison scale can be varied, allowing the users to zoom-in on specific details, or obtain a broad overview of a long segment. Software applications have been made available for non-commercial use at http://bioinformatics.ua.pt/software/dna-at-glance. |
format | Online Article Text |
id | pubmed-3836782 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38367822013-11-25 DNA Sequences at a Glance Pinho, Armando J. Garcia, Sara P. Pratas, Diogo Ferreira, Paulo J. S. G. PLoS One Research Article Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences. Genomic data sets are growing rapidly, making their analysis increasingly more difficult, and raising the need for new, scalable tools. For example, being able to look at very large DNA sequences while immediately identifying potentially interesting regions would provide the biologist with a flexible exploratory and analytical tool. In this paper we present a new concept, the “information profile”, which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The computation of the information profiles is computationally tractable: we show that it can be done in time proportional to the length of the sequence. We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h(−) and five human chromosomes 22 for illustration. We show that information profiles are useful for detecting large-scale genomic regularities by visual inspection. Several discovery strategies are possible, including the standalone analysis of single sequences, the comparative analysis of sequences from individuals from the same species, and the comparative analysis of sequences from different organisms. The comparison scale can be varied, allowing the users to zoom-in on specific details, or obtain a broad overview of a long segment. Software applications have been made available for non-commercial use at http://bioinformatics.ua.pt/software/dna-at-glance. Public Library of Science 2013-11-21 /pmc/articles/PMC3836782/ /pubmed/24278218 http://dx.doi.org/10.1371/journal.pone.0079922 Text en © 2013 Pinho et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Pinho, Armando J. Garcia, Sara P. Pratas, Diogo Ferreira, Paulo J. S. G. DNA Sequences at a Glance |
title | DNA Sequences at a Glance |
title_full | DNA Sequences at a Glance |
title_fullStr | DNA Sequences at a Glance |
title_full_unstemmed | DNA Sequences at a Glance |
title_short | DNA Sequences at a Glance |
title_sort | dna sequences at a glance |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836782/ https://www.ncbi.nlm.nih.gov/pubmed/24278218 http://dx.doi.org/10.1371/journal.pone.0079922 |
work_keys_str_mv | AT pinhoarmandoj dnasequencesataglance AT garciasarap dnasequencesataglance AT pratasdiogo dnasequencesataglance AT ferreirapaulojsg dnasequencesataglance |