Cargando…
Identifying elemental genomic track types and representing them uniformly
BACKGROUND: With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. T...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315820/ https://www.ncbi.nlm.nih.gov/pubmed/22208806 http://dx.doi.org/10.1186/1471-2105-12-494 |
_version_ | 1782228294618316800 |
---|---|
author | Gundersen, Sveinung Kalaš, Matúš Abul, Osman Frigessi, Arnoldo Hovig, Eivind Sandve, Geir Kjetil |
author_facet | Gundersen, Sveinung Kalaš, Matúš Abul, Osman Frigessi, Arnoldo Hovig, Eivind Sandve, Geir Kjetil |
author_sort | Gundersen, Sveinung |
collection | PubMed |
description | BACKGROUND: With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. RESULTS: We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. CONCLUSIONS: The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience. |
format | Online Article Text |
id | pubmed-3315820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33158202012-04-04 Identifying elemental genomic track types and representing them uniformly Gundersen, Sveinung Kalaš, Matúš Abul, Osman Frigessi, Arnoldo Hovig, Eivind Sandve, Geir Kjetil BMC Bioinformatics Methodology Article BACKGROUND: With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. RESULTS: We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. CONCLUSIONS: The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience. BioMed Central 2011-12-30 /pmc/articles/PMC3315820/ /pubmed/22208806 http://dx.doi.org/10.1186/1471-2105-12-494 Text en Copyright ©2011 Gundersen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Gundersen, Sveinung Kalaš, Matúš Abul, Osman Frigessi, Arnoldo Hovig, Eivind Sandve, Geir Kjetil Identifying elemental genomic track types and representing them uniformly |
title | Identifying elemental genomic track types and representing them uniformly |
title_full | Identifying elemental genomic track types and representing them uniformly |
title_fullStr | Identifying elemental genomic track types and representing them uniformly |
title_full_unstemmed | Identifying elemental genomic track types and representing them uniformly |
title_short | Identifying elemental genomic track types and representing them uniformly |
title_sort | identifying elemental genomic track types and representing them uniformly |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315820/ https://www.ncbi.nlm.nih.gov/pubmed/22208806 http://dx.doi.org/10.1186/1471-2105-12-494 |
work_keys_str_mv | AT gundersensveinung identifyingelementalgenomictracktypesandrepresentingthemuniformly AT kalasmatus identifyingelementalgenomictracktypesandrepresentingthemuniformly AT abulosman identifyingelementalgenomictracktypesandrepresentingthemuniformly AT frigessiarnoldo identifyingelementalgenomictracktypesandrepresentingthemuniformly AT hovigeivind identifyingelementalgenomictracktypesandrepresentingthemuniformly AT sandvegeirkjetil identifyingelementalgenomictracktypesandrepresentingthemuniformly |