Cargando…

Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection

Advances in high-throughput sequencing (HTS) technologies have accelerated our knowledge of genomes in hundreds of organisms, but the presence of repetitions found in every genome raises challenges to unambiguously map short reads. In particular, short polymorphic reads that are multialigned hinder...

Descripción completa

Detalles Bibliográficos
Autores principales: Jubin, Claire, Serero, Alexandre, Loeillet, Sophie, Barillot, Emmanuel, Nicolas, Alain
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059241/
https://www.ncbi.nlm.nih.gov/pubmed/24558267
http://dx.doi.org/10.1534/g3.113.009464
_version_ 1782321217884127232
author Jubin, Claire
Serero, Alexandre
Loeillet, Sophie
Barillot, Emmanuel
Nicolas, Alain
author_facet Jubin, Claire
Serero, Alexandre
Loeillet, Sophie
Barillot, Emmanuel
Nicolas, Alain
author_sort Jubin, Claire
collection PubMed
description Advances in high-throughput sequencing (HTS) technologies have accelerated our knowledge of genomes in hundreds of organisms, but the presence of repetitions found in every genome raises challenges to unambiguously map short reads. In particular, short polymorphic reads that are multialigned hinder our capacity to detect mutations. Here, we present two complementary bioinformatics strategies to perform more robust analyses of genome content and sequencing data, validated by use of the Saccharomyces cerevisiae fully sequenced genome. First, we created an annotated HTS profile for the reference genome, based on the production of virtual HTS reads. Using variable read lengths and different numbers of mismatches, we found that 35 nt-reads, with a maximum of 6 mismatches, targets 89.5% of the genome to unique (U) regions. Longer reads consisting of 50−100 nt provided little additional benefits on the U regions extent. Second, to analyze the remaining multialigned (M) regions, we identified the intragenomic single-nucleotide variants and thus defined the unique (M(U)) and multialigned (M(M)) subregions, as exemplified for the polymorphic copies of the six flocculation genes and the 50 Ty retrotransposons. As a resource, the coordinates of the U and M regions of the yeast genome have been added to the Saccharomyces Genome Database (www.yeastgenome.org). The benefit of this advanced method of genome annotation was confirmed by our ability to identify acquired single nucleotide polymorphisms in the U and M regions of an experimentally sequenced variant wild-type yeast strain.
format Online
Article
Text
id pubmed-4059241
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-40592412014-06-16 Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection Jubin, Claire Serero, Alexandre Loeillet, Sophie Barillot, Emmanuel Nicolas, Alain G3 (Bethesda) Investigations Advances in high-throughput sequencing (HTS) technologies have accelerated our knowledge of genomes in hundreds of organisms, but the presence of repetitions found in every genome raises challenges to unambiguously map short reads. In particular, short polymorphic reads that are multialigned hinder our capacity to detect mutations. Here, we present two complementary bioinformatics strategies to perform more robust analyses of genome content and sequencing data, validated by use of the Saccharomyces cerevisiae fully sequenced genome. First, we created an annotated HTS profile for the reference genome, based on the production of virtual HTS reads. Using variable read lengths and different numbers of mismatches, we found that 35 nt-reads, with a maximum of 6 mismatches, targets 89.5% of the genome to unique (U) regions. Longer reads consisting of 50−100 nt provided little additional benefits on the U regions extent. Second, to analyze the remaining multialigned (M) regions, we identified the intragenomic single-nucleotide variants and thus defined the unique (M(U)) and multialigned (M(M)) subregions, as exemplified for the polymorphic copies of the six flocculation genes and the 50 Ty retrotransposons. As a resource, the coordinates of the U and M regions of the yeast genome have been added to the Saccharomyces Genome Database (www.yeastgenome.org). The benefit of this advanced method of genome annotation was confirmed by our ability to identify acquired single nucleotide polymorphisms in the U and M regions of an experimentally sequenced variant wild-type yeast strain. Genetics Society of America 2014-02-20 /pmc/articles/PMC4059241/ /pubmed/24558267 http://dx.doi.org/10.1534/g3.113.009464 Text en Copyright © 2014 Jubin et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Jubin, Claire
Serero, Alexandre
Loeillet, Sophie
Barillot, Emmanuel
Nicolas, Alain
Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection
title Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection
title_full Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection
title_fullStr Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection
title_full_unstemmed Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection
title_short Sequence Profiling of the Saccharomyces cerevisiae Genome Permits Deconvolution of Unique and Multialigned Reads for Variant Detection
title_sort sequence profiling of the saccharomyces cerevisiae genome permits deconvolution of unique and multialigned reads for variant detection
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059241/
https://www.ncbi.nlm.nih.gov/pubmed/24558267
http://dx.doi.org/10.1534/g3.113.009464
work_keys_str_mv AT jubinclaire sequenceprofilingofthesaccharomycescerevisiaegenomepermitsdeconvolutionofuniqueandmultialignedreadsforvariantdetection
AT sereroalexandre sequenceprofilingofthesaccharomycescerevisiaegenomepermitsdeconvolutionofuniqueandmultialignedreadsforvariantdetection
AT loeilletsophie sequenceprofilingofthesaccharomycescerevisiaegenomepermitsdeconvolutionofuniqueandmultialignedreadsforvariantdetection
AT barillotemmanuel sequenceprofilingofthesaccharomycescerevisiaegenomepermitsdeconvolutionofuniqueandmultialignedreadsforvariantdetection
AT nicolasalain sequenceprofilingofthesaccharomycescerevisiaegenomepermitsdeconvolutionofuniqueandmultialignedreadsforvariantdetection