Cargando…

kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity

Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or “samples”) in an unbiased manner, preferably de novo. Rapid estimation of genetic relatednes...

Descripción completa

Detalles Bibliográficos
Autores principales: Murray, Kevin D., Webers, Christfried, Ong, Cheng Soon, Borevitz, Justin, Warthmann, Norman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600398/
https://www.ncbi.nlm.nih.gov/pubmed/28873405
http://dx.doi.org/10.1371/journal.pcbi.1005727
_version_ 1783264235437948928
author Murray, Kevin D.
Webers, Christfried
Ong, Cheng Soon
Borevitz, Justin
Warthmann, Norman
author_facet Murray, Kevin D.
Webers, Christfried
Ong, Cheng Soon
Borevitz, Justin
Warthmann, Norman
author_sort Murray, Kevin D.
collection PubMed
description Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or “samples”) in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.
format Online
Article
Text
id pubmed-5600398
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56003982017-09-22 kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity Murray, Kevin D. Webers, Christfried Ong, Cheng Soon Borevitz, Justin Warthmann, Norman PLoS Comput Biol Research Article Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or “samples”) in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip. Public Library of Science 2017-09-05 /pmc/articles/PMC5600398/ /pubmed/28873405 http://dx.doi.org/10.1371/journal.pcbi.1005727 Text en © 2017 Murray et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Murray, Kevin D.
Webers, Christfried
Ong, Cheng Soon
Borevitz, Justin
Warthmann, Norman
kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
title kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
title_full kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
title_fullStr kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
title_full_unstemmed kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
title_short kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
title_sort kwip: the k-mer weighted inner product, a de novo estimator of genetic similarity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600398/
https://www.ncbi.nlm.nih.gov/pubmed/28873405
http://dx.doi.org/10.1371/journal.pcbi.1005727
work_keys_str_mv AT murraykevind kwipthekmerweightedinnerproductadenovoestimatorofgeneticsimilarity
AT weberschristfried kwipthekmerweightedinnerproductadenovoestimatorofgeneticsimilarity
AT ongchengsoon kwipthekmerweightedinnerproductadenovoestimatorofgeneticsimilarity
AT borevitzjustin kwipthekmerweightedinnerproductadenovoestimatorofgeneticsimilarity
AT warthmannnorman kwipthekmerweightedinnerproductadenovoestimatorofgeneticsimilarity