Cargando…

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, includ...

Descripción completa

Detalles Bibliográficos
Autores principales: Margarido, Gabriel R. A., Heckerman, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4400156/
https://www.ncbi.nlm.nih.gov/pubmed/25880203
http://dx.doi.org/10.1371/journal.pcbi.1004229
_version_ 1782366998925148160
author Margarido, Gabriel R. A.
Heckerman, David
author_facet Margarido, Gabriel R. A.
Heckerman, David
author_sort Margarido, Gabriel R. A.
collection PubMed
description As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.
format Online
Article
Text
id pubmed-4400156
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44001562015-04-21 ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data Margarido, Gabriel R. A. Heckerman, David PLoS Comput Biol Research Article As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed. Public Library of Science 2015-04-16 /pmc/articles/PMC4400156/ /pubmed/25880203 http://dx.doi.org/10.1371/journal.pcbi.1004229 Text en © 2015 Margarido, Heckerman http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Margarido, Gabriel R. A.
Heckerman, David
ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
title ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
title_full ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
title_fullStr ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
title_full_unstemmed ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
title_short ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
title_sort conpade: genome assembly ploidy estimation from next-generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4400156/
https://www.ncbi.nlm.nih.gov/pubmed/25880203
http://dx.doi.org/10.1371/journal.pcbi.1004229
work_keys_str_mv AT margaridogabrielra conpadegenomeassemblyploidyestimationfromnextgenerationsequencingdata
AT heckermandavid conpadegenomeassemblyploidyestimationfromnextgenerationsequencingdata