Cargando…

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

BACKGROUND: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an a...

Descripción completa

Detalles Bibliográficos
Autores principales: You, Frank M, Huo, Naxin, Deal, Karin R, Gu, Yong Q, Luo, Ming-Cheng, McGuire, Patrick E, Dvorak, Jan, Anderson, Olin D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041743/
https://www.ncbi.nlm.nih.gov/pubmed/21266061
http://dx.doi.org/10.1186/1471-2164-12-59
_version_ 1782198471082639360
author You, Frank M
Huo, Naxin
Deal, Karin R
Gu, Yong Q
Luo, Ming-Cheng
McGuire, Patrick E
Dvorak, Jan
Anderson, Olin D
author_facet You, Frank M
Huo, Naxin
Deal, Karin R
Gu, Yong Q
Luo, Ming-Cheng
McGuire, Patrick E
Dvorak, Jan
Anderson, Olin D
author_sort You, Frank M
collection PubMed
description BACKGROUND: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. RESULTS: An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. CONCLUSION: An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml).
format Text
id pubmed-3041743
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30417432011-02-24 Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence You, Frank M Huo, Naxin Deal, Karin R Gu, Yong Q Luo, Ming-Cheng McGuire, Patrick E Dvorak, Jan Anderson, Olin D BMC Genomics Methodology Article BACKGROUND: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. RESULTS: An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. CONCLUSION: An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). BioMed Central 2011-01-25 /pmc/articles/PMC3041743/ /pubmed/21266061 http://dx.doi.org/10.1186/1471-2164-12-59 Text en Copyright ©2011 You et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
You, Frank M
Huo, Naxin
Deal, Karin R
Gu, Yong Q
Luo, Ming-Cheng
McGuire, Patrick E
Dvorak, Jan
Anderson, Olin D
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
title Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
title_full Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
title_fullStr Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
title_full_unstemmed Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
title_short Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
title_sort annotation-based genome-wide snp discovery in the large and complex aegilops tauschii genome using next-generation sequencing without a reference genome sequence
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041743/
https://www.ncbi.nlm.nih.gov/pubmed/21266061
http://dx.doi.org/10.1186/1471-2164-12-59
work_keys_str_mv AT youfrankm annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT huonaxin annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT dealkarinr annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT guyongq annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT luomingcheng annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT mcguirepatricke annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT dvorakjan annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence
AT andersonolind annotationbasedgenomewidesnpdiscoveryinthelargeandcomplexaegilopstauschiigenomeusingnextgenerationsequencingwithoutareferencegenomesequence