Cargando…

Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates

A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not...

Descripción completa

Detalles Bibliográficos
Autores principales: Barlow, Axel, Hartmann, Stefanie, Gonzalez, Javier, Hofreiter, Michael, Paijmans, Johanna L. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017230/
https://www.ncbi.nlm.nih.gov/pubmed/31906474
http://dx.doi.org/10.3390/genes11010050
_version_ 1783497154471395328
author Barlow, Axel
Hartmann, Stefanie
Gonzalez, Javier
Hofreiter, Michael
Paijmans, Johanna L. A.
author_facet Barlow, Axel
Hartmann, Stefanie
Gonzalez, Javier
Hofreiter, Michael
Paijmans, Johanna L. A.
author_sort Barlow, Axel
collection PubMed
description A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not control for differential rates and types of sequencing error, which are frequently large and variable in datasets obtained from ancient samples. These errors have the potential to distort phylogenetic and population clustering analyses, and to mislead tests of admixture using D statistics. We introduce Consensify, a method for generating pseudohaploid sequences, which controls for biases resulting from differential sequencing coverage while greatly reducing error rates. The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. For phylogenetic and population clustering analysis, we find that Consensify is less affected by artefacts than methods based on single read sampling. For D statistics, Consensify is more resistant to false positives and appears to be less affected by biases resulting from different laboratory protocols than other frequently used methods. Although Consensify is developed with palaeogenomic data in mind, it is applicable for any low to medium coverage short read datasets. We predict that Consensify will be a useful tool for future studies of palaeogenomes.
format Online
Article
Text
id pubmed-7017230
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-70172302020-02-28 Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates Barlow, Axel Hartmann, Stefanie Gonzalez, Javier Hofreiter, Michael Paijmans, Johanna L. A. Genes (Basel) Article A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not control for differential rates and types of sequencing error, which are frequently large and variable in datasets obtained from ancient samples. These errors have the potential to distort phylogenetic and population clustering analyses, and to mislead tests of admixture using D statistics. We introduce Consensify, a method for generating pseudohaploid sequences, which controls for biases resulting from differential sequencing coverage while greatly reducing error rates. The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. For phylogenetic and population clustering analysis, we find that Consensify is less affected by artefacts than methods based on single read sampling. For D statistics, Consensify is more resistant to false positives and appears to be less affected by biases resulting from different laboratory protocols than other frequently used methods. Although Consensify is developed with palaeogenomic data in mind, it is applicable for any low to medium coverage short read datasets. We predict that Consensify will be a useful tool for future studies of palaeogenomes. MDPI 2020-01-02 /pmc/articles/PMC7017230/ /pubmed/31906474 http://dx.doi.org/10.3390/genes11010050 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Barlow, Axel
Hartmann, Stefanie
Gonzalez, Javier
Hofreiter, Michael
Paijmans, Johanna L. A.
Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates
title Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates
title_full Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates
title_fullStr Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates
title_full_unstemmed Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates
title_short Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates
title_sort consensify: a method for generating pseudohaploid genome sequences from palaeogenomic datasets with reduced error rates
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017230/
https://www.ncbi.nlm.nih.gov/pubmed/31906474
http://dx.doi.org/10.3390/genes11010050
work_keys_str_mv AT barlowaxel consensifyamethodforgeneratingpseudohaploidgenomesequencesfrompalaeogenomicdatasetswithreducederrorrates
AT hartmannstefanie consensifyamethodforgeneratingpseudohaploidgenomesequencesfrompalaeogenomicdatasetswithreducederrorrates
AT gonzalezjavier consensifyamethodforgeneratingpseudohaploidgenomesequencesfrompalaeogenomicdatasetswithreducederrorrates
AT hofreitermichael consensifyamethodforgeneratingpseudohaploidgenomesequencesfrompalaeogenomicdatasetswithreducederrorrates
AT paijmansjohannala consensifyamethodforgeneratingpseudohaploidgenomesequencesfrompalaeogenomicdatasetswithreducederrorrates