Cargando…

A shot in the genome: how accurately do shotgun 454 sequences represent a genome?

BACKGROUND: Next generation sequencing (NGS) provides a valuable method to quickly obtain sequence information from non-model organisms at a genomic scale. In principle, if sequencing is not targeted for a genomic region or sequence type (e.g. coding region, microsatellites) NGS reads can be used as...

Descripción completa

Detalles Bibliográficos
Autores principales: Meglécz, Emese, Pech, Nicolas, Gilles, André, Martin, Jean-François, Gardner, Michael G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444912/
https://www.ncbi.nlm.nih.gov/pubmed/22640415
http://dx.doi.org/10.1186/1756-0500-5-259
_version_ 1782243737885212672
author Meglécz, Emese
Pech, Nicolas
Gilles, André
Martin, Jean-François
Gardner, Michael G
author_facet Meglécz, Emese
Pech, Nicolas
Gilles, André
Martin, Jean-François
Gardner, Michael G
author_sort Meglécz, Emese
collection PubMed
description BACKGROUND: Next generation sequencing (NGS) provides a valuable method to quickly obtain sequence information from non-model organisms at a genomic scale. In principle, if sequencing is not targeted for a genomic region or sequence type (e.g. coding region, microsatellites) NGS reads can be used as a genome snapshot and provide information on the different types of sequences in the genome. However, no study has ascertained if a typical 454 dataset of low coverage (1/4-1/8 of a PicoTiter plate leading to generally less than 0.1x of coverage) represents all parts of genomes equally. FINDINGS: Partial genome shotgun sequencing of total DNA (without enrichment) on a 454 NGS platform was used to obtain reads of Apis mellifera (454 reads hereafter). These 454 reads were compared to the assembled chromosomes of this species in three different aspects: (i) dimer and trimer compositions, (ii) the distribution of mapped 454 sequences along the chromosomes and (iii) the numbers of different classes of microsatellites. Highly significant chi-square tests for all three types of analyses indicated that the 454 data is not a perfect random sample of the genome. Only the number of 454 reads mapped to each of the 16 chromosomes and the number of microsatellites pooled by motif (repeat unit) length was not significantly different from the expected values. However, a very strong correlation (correlation coefficients greater than 0.97) was observed between most of the 454 variables (the number of different dimers and trimers, the number of 454 reads mapped to each chromosome fragments of one Mb, the number of 454 reads mapped to each chromosome, the number of microsatellites of each class) and their corresponding genomic variables. CONCLUSIONS: The results of chi square tests suggest that 454 shotgun reads cannot be regarded as a perfect representation of the genome especially if the comparison is done on a finer scale (e.g. chromosome fragments instead of whole chromosomes). However, the high correlation between 454 and genome variables tested indicate that a high proportion of the variability of 454 variables is explained by their genomic counterparts. Therefore, we conclude that using 454 data to obtain information on the genome is biologically meaningful.
format Online
Article
Text
id pubmed-3444912
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34449122012-09-19 A shot in the genome: how accurately do shotgun 454 sequences represent a genome? Meglécz, Emese Pech, Nicolas Gilles, André Martin, Jean-François Gardner, Michael G BMC Res Notes Short Report BACKGROUND: Next generation sequencing (NGS) provides a valuable method to quickly obtain sequence information from non-model organisms at a genomic scale. In principle, if sequencing is not targeted for a genomic region or sequence type (e.g. coding region, microsatellites) NGS reads can be used as a genome snapshot and provide information on the different types of sequences in the genome. However, no study has ascertained if a typical 454 dataset of low coverage (1/4-1/8 of a PicoTiter plate leading to generally less than 0.1x of coverage) represents all parts of genomes equally. FINDINGS: Partial genome shotgun sequencing of total DNA (without enrichment) on a 454 NGS platform was used to obtain reads of Apis mellifera (454 reads hereafter). These 454 reads were compared to the assembled chromosomes of this species in three different aspects: (i) dimer and trimer compositions, (ii) the distribution of mapped 454 sequences along the chromosomes and (iii) the numbers of different classes of microsatellites. Highly significant chi-square tests for all three types of analyses indicated that the 454 data is not a perfect random sample of the genome. Only the number of 454 reads mapped to each of the 16 chromosomes and the number of microsatellites pooled by motif (repeat unit) length was not significantly different from the expected values. However, a very strong correlation (correlation coefficients greater than 0.97) was observed between most of the 454 variables (the number of different dimers and trimers, the number of 454 reads mapped to each chromosome fragments of one Mb, the number of 454 reads mapped to each chromosome, the number of microsatellites of each class) and their corresponding genomic variables. CONCLUSIONS: The results of chi square tests suggest that 454 shotgun reads cannot be regarded as a perfect representation of the genome especially if the comparison is done on a finer scale (e.g. chromosome fragments instead of whole chromosomes). However, the high correlation between 454 and genome variables tested indicate that a high proportion of the variability of 454 variables is explained by their genomic counterparts. Therefore, we conclude that using 454 data to obtain information on the genome is biologically meaningful. BioMed Central 2012-05-28 /pmc/articles/PMC3444912/ /pubmed/22640415 http://dx.doi.org/10.1186/1756-0500-5-259 Text en Copyright ©2012 Meglecz et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Short Report
Meglécz, Emese
Pech, Nicolas
Gilles, André
Martin, Jean-François
Gardner, Michael G
A shot in the genome: how accurately do shotgun 454 sequences represent a genome?
title A shot in the genome: how accurately do shotgun 454 sequences represent a genome?
title_full A shot in the genome: how accurately do shotgun 454 sequences represent a genome?
title_fullStr A shot in the genome: how accurately do shotgun 454 sequences represent a genome?
title_full_unstemmed A shot in the genome: how accurately do shotgun 454 sequences represent a genome?
title_short A shot in the genome: how accurately do shotgun 454 sequences represent a genome?
title_sort shot in the genome: how accurately do shotgun 454 sequences represent a genome?
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444912/
https://www.ncbi.nlm.nih.gov/pubmed/22640415
http://dx.doi.org/10.1186/1756-0500-5-259
work_keys_str_mv AT megleczemese ashotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT pechnicolas ashotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT gillesandre ashotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT martinjeanfrancois ashotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT gardnermichaelg ashotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT megleczemese shotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT pechnicolas shotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT gillesandre shotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT martinjeanfrancois shotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome
AT gardnermichaelg shotinthegenomehowaccuratelydoshotgun454sequencesrepresentagenome