Cargando…

SNPFile – A software library and file format for large scale association mapping and population genetics studies

BACKGROUND: High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulat...

Descripción completa

Detalles Bibliográficos
Autores principales: Nielsen, Jesper, Mailund, Thomas
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2633306/
https://www.ncbi.nlm.nih.gov/pubmed/19063732
http://dx.doi.org/10.1186/1471-2105-9-526
_version_ 1782164099178692608
author Nielsen, Jesper
Mailund, Thomas
author_facet Nielsen, Jesper
Mailund, Thomas
author_sort Nielsen, Jesper
collection PubMed
description BACKGROUND: High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions. RESULTS: We describe a new binary file format for SNP data, together with a software library for file manipulation. The file format stores genotype data together with any kind of additional data, using a flexible serialisation mechanism. The format is designed to be IO efficient for the access patterns of most multi-locus analysis methods. CONCLUSION: The new file format has been very useful for our own studies where it has significantly reduced the informatics burden in keeping track of various secondary data, and where the memory and IO efficiency has greatly simplified analysis runs. A main limitation with the file format is that it is only supported by the very limited set of analysis tools developed in our own lab. This is somewhat alleviated by a scripting interfaces that makes it easy to write converters to and from the format.
format Text
id pubmed-2633306
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26333062009-01-31 SNPFile – A software library and file format for large scale association mapping and population genetics studies Nielsen, Jesper Mailund, Thomas BMC Bioinformatics Software BACKGROUND: High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions. RESULTS: We describe a new binary file format for SNP data, together with a software library for file manipulation. The file format stores genotype data together with any kind of additional data, using a flexible serialisation mechanism. The format is designed to be IO efficient for the access patterns of most multi-locus analysis methods. CONCLUSION: The new file format has been very useful for our own studies where it has significantly reduced the informatics burden in keeping track of various secondary data, and where the memory and IO efficiency has greatly simplified analysis runs. A main limitation with the file format is that it is only supported by the very limited set of analysis tools developed in our own lab. This is somewhat alleviated by a scripting interfaces that makes it easy to write converters to and from the format. BioMed Central 2008-12-08 /pmc/articles/PMC2633306/ /pubmed/19063732 http://dx.doi.org/10.1186/1471-2105-9-526 Text en Copyright © 2008 Nielsen and Mailund; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Nielsen, Jesper
Mailund, Thomas
SNPFile – A software library and file format for large scale association mapping and population genetics studies
title SNPFile – A software library and file format for large scale association mapping and population genetics studies
title_full SNPFile – A software library and file format for large scale association mapping and population genetics studies
title_fullStr SNPFile – A software library and file format for large scale association mapping and population genetics studies
title_full_unstemmed SNPFile – A software library and file format for large scale association mapping and population genetics studies
title_short SNPFile – A software library and file format for large scale association mapping and population genetics studies
title_sort snpfile – a software library and file format for large scale association mapping and population genetics studies
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2633306/
https://www.ncbi.nlm.nih.gov/pubmed/19063732
http://dx.doi.org/10.1186/1471-2105-9-526
work_keys_str_mv AT nielsenjesper snpfileasoftwarelibraryandfileformatforlargescaleassociationmappingandpopulationgeneticsstudies
AT mailundthomas snpfileasoftwarelibraryandfileformatforlargescaleassociationmappingandpopulationgeneticsstudies